Hi Team,
I am working on an assignment and am using the https://api.themoviedb.org/3/discover/movie for download the complete dataset of movies. On first call, API returns "total_pages": 38719, "total_results": 774367.
I am using the API in recursive mode by adding the page parameter in every call. But after 61 pages API return error "cURL error 56: OpenSSL SSL_read: Connection reset by peer".
Can you please suggest, how can we get the complete data from the API.
Thanks
Can't find a movie or TV show? Login to create it.
Want to rate or add this item to a list?
Not a member?
Reply by robbie3999
on June 14, 2023 at 10:23 AM
Hi @netsmartz, there are several things to look at here.
This api isn't designed for a user to download everything. Queries are limited to a maximum of 500 pages to download. If you try to download any page > 500 you will get an error. In theory you could break the query up into smaller pieces, for example, by year, but that would be up to you. Another option is the daily file exports, but these only have a bare minimum of information about each title.
Maybe this is just a matter of semantics, but I'm not sure why you would do this using recursion. Most would use a simple loop in synchronous coding, or start a specific number of tasks in asynchronous coding. Recursion would mean you are nesting your calls for each page, so you would have to nest the calls 38719 levels deep, one for each page. There are some basic rate limiting done on this site, I believe it is something like 40 calls per second and 20 simultaneous calls per ip address, so you could be hitting one of these limits using a recursive method. Or, it could be simply a transient one time problem.
Reply by netsmartz
on June 15, 2023 at 2:05 AM
Thanks Team, for the update.
our model is working like that-
Reply by robbie3999
on June 15, 2023 at 10:35 AM
My suggestion would be to run this at least several times to look for three possibilities.
You might also add the "-v" option to curl to see more output and compare output when it works and when it doesn't. Also it might be useful to know exactly what api call you are making. Mask out your api key before you post it here.
Reply by netsmartz
on June 15, 2023 at 1:30 PM
It always happens exactly after page 61. Also how can I get the data of pages > 500 ?
Reply by robbie3999
on June 15, 2023 at 4:37 PM
You can't. You would have to run different queries to get more results. Some options are explained in the first paragraph of the first reply.
I would try running the query starting on the first page that fails first. For example, "...query...&page=62". That will determine if it is that specific page or because you ran 61 queries before. You might also add the "-v" option to curl to see more output and compare output when it works and when it doesn't. Also it would be useful to know exactly what api call you are making. Mask out your api key before you post it here. Also would help to see a section of code where you are running the queries.