How to retrieve large amounts of data (5000+ videos) from YouTube Data API v3?

Question

My goal is to extract all videos from a playlist which can have many videos, ~3000 and can have more than 5000 videos. With maxResults=50 and after implementing pagination with nextPageToken, I'm only able to call the API 20 times, after which nextPageToken isn't sent with the response

I'm calling the API from a python application. I have a while loop running till nextPageToken isn't sent, ideally this should happen AFTER all the videos are extracted, but it prematurely exits after calling the API 19-20 times

def main():
    youtube = get_authorised_youtube()  # returns YouTube resource authorized with OAuth.

    first_response = make_single_request(youtube, None)  # make_single_request() takes in the youtube resource and nextPageToken, if any.
    nextPageToken = first_response["nextPageToken"]

    try:
        count = 0
        while True:
            response = make_single_request(youtube, nextPageToken)
            nextPageToken = response["nextPageToken"]


            count += 1
            print(count, end=" ")
            print(nextPageToken)
    except KeyError as e:  # KeyError to catch if nextPageToken wasn't present
        response.pop("items")
        print(response)  # prints the last response for analysis


if __name__ == '__main__':
    main()

snippet of make_single_request():

def make_single_request(youtube, nextPageToken):
    if nextPageToken is None:
        request = youtube.videos().list(
            part="id",
            myRating="like",
            maxResults=50
        )
    else:
        request = youtube.videos().list(
            part="id",
            myRating="like",
            pageToken=nextPageToken,
            maxResults=50
        )
    response = request.execute()

    return response

Expected the code to make upwards of 50 API calls but is observed to only make around 20 calls, consistently.

Note: The following code was executed with an unpaid GCP account. The calls made has part="id" which has a quota cost of 0. The calls limit according to GCP is: 10,000. According to the quota on the console, I make only 20.

Output:

1 CGQQAA
2 CJYBEAA
3 CMgBEAA
4 CPoBEAA
5 CKwCEAA
6 CN4CEAA
7 CJADEAA
8 CMIDEAA
9 CPQDEAA
10 CKYEEAA
11 CNgEEAA
12 CIoFEAA
13 CLwFEAA
14 CO4FEAA
15 CKAGEAA
16 CNIGEAA
17 CIQHEAA
18 CLYHEAA
19 {'kind': 'youtube#videoListResponse', 'etag': '"ETAG"', 'prevPageToken': 'CLYHEAE', 'pageInfo': {'totalResults': TOTAL_RESULTS(>4000), 'resultsPerPage': 50}}

EDIT: After changing maxResults=20, It is observed that the code makes around 50 API calls, therefore the total number of videos that can be extracted is a constant at 1000.

Your code does not extract all videos as you mentioned in the description text, but only those that have `.myRating == "like"`. This implies a smaller result set than the whole set of 3000 videos, thus, making the pagination loop terminate earlier than you expect. — stvar, May 31 '19 at 07:15
Hey, my goal is to extract all of my liked videos, and I have around 8000 of them . The current code is able to only extract 1000 of them. — Gaurav K, May 31 '19 at 07:20
Hey, I still think you're wrong! Double-check your data, please!, and be consistent with the numbers you put forward! I myself, not long ago, successfully paginated a playlist containing almost 17K entries -- thus obtaining from the API about 340 pages of JSON response data. — stvar, May 31 '19 at 07:38
Here's the exact response json file I made for ```part="id"``` after appending the results of each call to ```items```: ```json { "etag": "ETAG", "items": [ // video IDs of 1028 videos ], "kind": "youtube#videoListResponse", "pageInfo": { "resultsPerPage": 50, "totalResults": 8153 } } ``` The sample code I referred to get the liked videos was here: https://developers.google.com/youtube/v3/docs/videos/list?apix=true#part which used the youtube.videos().list() method with myRating="like" — Gaurav K, May 31 '19 at 07:52
Again, your numbers do not add up: 1028 / 50 == 20, 1028 % 50 == 28, therefore you should have obtained 21 pages, instead of 19 as shown by your output text above. — stvar, May 31 '19 at 09:29
The numbers might not add up as some videos I've liked would have been deleted/privated. What do you suggest will help me extract all of the videos in the playlist? — Gaurav K, May 31 '19 at 09:37

score 1 · Answer 1 · answered May 31 '19 at 10:00

1

For obtaining the entire list of liked videos of a given channel without any omissions, I suggest you to use PlaylistItems endpoint instead, queried for the given channel's liked-videos playlist by passing a proper value to the endpoint's playlistId parameter.

A given channel's liked-videos playlist ID is obtained upon querying the channel's own endpoint. The needed ID is to be found at .items.contentDetails.relatedPlaylists.likes.

answered May 31 '19 at 10:00

stvar

6,551
2
13
28

1

Thanks! I was able to cross the 1000 limit but I could only extract 5000.. – Gaurav K May 31 '19 at 10:41
According to things I already experienced, one should not be limited to a couple of thousand of entries: as already mentioned above, I got many more than a dozen thousand of entries from the PlaylistItems endpoint. Review your code to assure yourself that everything is as should be as per the API docs. Also check your quotas. – stvar May 31 '19 at 10:47
2

Hey! So according to the API response for the PlaylistItems endpoint, my likes playlist only contains 5000 videos, which I was able to extract as mentioned int the comment above, but the actual number is around 9000 which I was able to get from Videos.list endpoint as mentioned in the question with the myRating="like" parameter. Is there any other way to get all the videos? – Gaurav K Apr 11 '20 at 13:24
@GauravK have you ever found out? – rocky Aug 03 '22 at 22:41
1

@rocky, unfortunately not. Do let me know if you find a way to extract >5000 videos in a playlist! – Gaurav K Aug 05 '22 at 10:28
I wrote a migration utility to bring my data (likes, playlists, subscriptions) from one yt account to another. The only way I could achieve migrating my 5k+ likes playlist was to continuously remove items from the source list while adding it to the target list. – rocky Aug 05 '22 at 21:53
1

source code: https://github.com/petrsvihlik/YouTube.AccountManager (work in progress, very dirty solution...trying to get things done fast :)) – rocky Aug 05 '22 at 22:00

score 0 · Answer 2 · answered Dec 30 '22 at 15:46

if the goal is to retrieve the FULL list of liked videos in a tideous but working way you can checkout this question.

you basically scrape the data of a deeplink page...

and whats not mentioned in this post is that after you have retrieved the video ids and you may want more data, you can use the videos endpoint with a list of comma seperated video ids to get more informations.

if you need inspirations for the script this is an adjusted version of the api scripts that are provided by youtube

just adjust the credentials file path and the input path of the file thats been retrieved by doing the webscrape

import os

import google_auth_oauthlib.flow
import googleapiclient.discovery
import googleapiclient.errors
import json

scopes = ["https://www.googleapis.com/auth/youtube.readonly"]

def do_request(youtube, video_ids):
    #https://developers.google.com/youtube/v3/docs/videos/list
    request = youtube.videos().list(
        part='contentDetails,id,snippet,statistics',
        id=','.join(video_ids),
        maxResults=50
    )

    return request.execute()["items"]

def main(video_ids):
    # Disable OAuthlib's HTTPS verification when running locally.
    # *DO NOT* leave this option enabled in production.
    os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"

    api_service_name = "youtube"
    api_version = "v3"
    client_secrets_file = "INPUTAPICREDFILEHERE./creds.json"

    # Get credentials and create an API client
    flow = google_auth_oauthlib.flow.InstalledAppFlow.from_client_secrets_file(
        client_secrets_file, scopes)
    credentials = flow.run_console()
    youtube = googleapiclient.discovery.build(
        api_service_name, api_version, credentials=credentials)

    data = { 'items': [] }
    current_id_batch = []
    for id in video_ids:
        if len(current_id_batch) == 50:
            print(f"Fetching.. current batch {len(data['items'])} of {len(video_ids)}")
            result = do_request(youtube, current_id_batch)
            data['items'].extend(result)
            current_id_batch = []
        current_id_batch.append(id)
    
    result = do_request(youtube, current_id_batch)
    data['items'].extend(result)
    
    with open('./data.json', 'w') as outfile:
        outfile.write(json.dumps(data, indent=4))

if __name__ == "__main__":
    liked_vids = {}
    f = open('PATHTOLIKEDVIDEOS/liked_videos.json', encoding="utf8")
    liked_vids = json.load(f)
    main(list(liked_vids.keys()))

score -1 · Answer 3 · answered May 31 '19 at 06:14

-1

Try to wait some time in a such way:

import time
time.sleep(1) # time here in seconds

answered May 31 '19 at 06:14

Viktor Ilienko

817
8
15

Thanks for your response, Just tried adding a 3 second delay and got the same result as before – Gaurav K May 31 '19 at 06:34

How to retrieve large amounts of data (5000+ videos) from YouTube Data API v3?

3 Answers3

Linked