2

I came across the API reference document regarding the Search API:

Note: Search results are constrained to a maximum of 500 videos if your request specifies a value for the channelId parameter and sets the type parameter value to video, [...].

Do I have to apply for paid account to break the 500 videos limit? If yes, how do I apply?

stvar
  • 6,551
  • 2
  • 13
  • 28
Johnny
  • 95
  • 7
  • No, there's no official way of breaking the 500 limit specified by the official docs. But the API is feature rich, thus, do update your answer mentioning what is that you want to achieve, such that I can direct you further. Also do mention what programming environment you're using. (Note that the API can in fact be used within a pletora of modern programming environments/languages via publicly available Google API libraries.) – stvar Nov 06 '20 at 08:12
  • @stvar Thanks for the reply. I am gathering all the YouTube video metadata for reporting purpose. I am using python to access the both Data and Analytics API. For each video, I am interested in viewCount, likeCount, dislikeCount, duration, estimatedMinutesWatches, averageViewDuration...etc. – Johnny Nov 06 '20 at 18:22

1 Answers1

2

If you need to obtain the list of all videos of a given channel -- identified by its ID, say CHANNEL_ID --, then you have to proceed as follows:

Step 1: Query the Channels.list API endpoint with parameter id=CHANNEL_ID for to obtain from the API the ID of that channel's uploads playlist:

response = youtube.channels().list(
    id = CHANNEL_ID,
    part = 'contentDetails',
    fields = 'items(contentDetails(relatedPlaylists(uploads)))',
    maxResults = 1
).execute()

uploads_id = response \
    ['contentDetails'] \
    ['relatedPlaylists'] \
    ['uploads']

The code above should run only once for obtaining the uploads playlist ID as uploads_id, then that ID should be used as many times as needed.

Usually, a channel ID and its corresponding uploads playlist ID are related by s/^UC([0-9a-zA-Z_-]{22})$/UU\1/.

Step 2: Using the previously obtained uploads playlist ID -- let's name it UPLOADS_ID --, query the PlaylistItems.list API endpoint for to obtain the list of all video ID's of that playlist:

is_video = lambda item: \
    item['snippet']['resourceId']['kind'] == 'youtube#video'
video_id = lambda item: \
    item['snippet']['resourceId']['videoId']

request = youtube.playlistItems().list(
    playlistId = UPLOADS_ID,
    part = 'snippet',
    fields = 'nextPageToken,items(snippet(resourceId))',
    maxResults = 50
)
videos = []

while request:
    response = request.execute()

    items = response.get('items', [])

    videos.extend(map(video_id, filter(is_video, items)))

    request = youtube.playlistItems().list_next(
        request, response)

Upon running the code above, the list videos will contain the IDs of all videos that were uploaded on the channel identified by CHANNEL_ID.

Step 3: Query the Videos.list API endpoint for to obtain the statistics info (i.e. object) of each of the videos you're interested in:

class Stat:

    def __init__(video_id, view_count, like_count):
        self.video_id = video_id
        self.view_count = view_count
        self.like_count = like_count

stats = []

while len(videos):
    ids = videos[0:50]
    del videos[0:50]

    response = youtube.videos().list(
        id = ','.join(ids),
        part = 'id,statistics',
        fields = 'items(id,statistics)',
        maxResults = len(ids)
    ).execute()

    items = response['items']
    assert len(items) == len(ids)

    for item in items:
        stat = item['statistics']
        stats.append(
            Stat(
                video_id = item['id'],
                view_count = stat['viewCount'],
                like_count = stat['likeCount']
            )
        )

Note that code above, in case the list videos has length N, reduces the number of calls to Videos.list from N to math.floor(N / 50) + (1 if N % 50 else 0). That's because the parameter id of Videos.list endpoint can be specified as a comma-separated list of video IDs (the number of IDs in one such list can be maximum 50).

Note also that each piece of code above uses the fields request parameter for to obtain from the invoked API endpoints only the info that is of actual use.


I must also mention that according to YouTube's staff, there's an upper 20000 limit set by design for the number of items returned via PlaylistItems.list endpoint. This is unfortunate, but a fact.

stvar
  • 6,551
  • 2
  • 13
  • 28
  • Thanks for the steps. That's help a lot. I am having a little issue with combine both the data and Analytics API. I am trying to merge the analytics metadata into the data part and it go mess up. I will ask you later once I can read all the videos. – Johnny Nov 06 '20 at 19:31
  • In step2, the value returned was empty. Does it matter if I am not the original account that upload the videos? I am given as owner role to the channel video – Johnny Nov 10 '20 at 18:30
  • Please post that `UU...` ID so that I can try it myself. – stvar Nov 10 '20 at 18:31
  • If you're the account owner *and* are passing authorized credentials (i.e. a valid access token) to the API, then the response will include all uploaded video IDs. Otherwise (you're not the owner *or* you're accessing the endpoint using an API key), you'll get only the IDs of the videos that are *public*. – stvar Nov 10 '20 at 18:34
  • I was accessing the endpoint using an API key. You were right about it. If I am using Google OAuth 2.0, what scopes do I need to use? I am new to Google OAuth 2.0. When using the OAuth, is there a way to bypass the cut and paste authorize URL link to get the code? Thanks! – Johnny Nov 12 '20 at 16:55
  • Here is my OAuth code: SCOPES = ['https://www.googleapis.com/auth/youtube','https://www.googleapis.com/auth/youtube.download'] API_SERVICE_NAME = 'youtube' API_VERSION = 'v3' CLIENT_SECRETS_FILE = './client_secrets.json' def get_authenticated_service(): flow = InstalledAppFlow.from_client_secrets_file(CLIENT_SECRETS_FILE, SCOPES) credentials = flow.run_console() return build(API_SERVICE_NAME, API_VERSION, credentials = credentials) In Main: youtube = get_authenticated_service() – Johnny Nov 12 '20 at 16:57
  • With respect to the issue of *bypassing the cut and paste authorize URL link to get the code* the answer is yes, that's very much possible. You have to have *only one successful* OAuth 2.0 authentication/authorization flow; then the credentials data obtained (your object `credentials`) can be saved to a local file, such that the `n`-th invocation of your program, with `n >= 2`, to avoid the OAuth flow altogether by loading in the credentials data from that local file. See the code contained in [my answer to this very question](https://stackoverflow.com/a/64719550/8327971). – stvar Nov 12 '20 at 17:37
  • With respect to your API scopes, yes `https://www.googleapis.com/auth/youtube` is OK by being almost all encompassing; see them all [here](https://developers.google.com/identity/protocols/oauth2/scopes#youtube). – stvar Nov 12 '20 at 17:41
  • I will checkout your answer for bypassing the authentication flow. That is very helpful. Thank you so much for your help! – Johnny Nov 12 '20 at 17:48
  • I have been struggled for a few weeks and trying to figure out what went wrong in step2. Finally, I had found out changing the id to playlistId in youtube.videos().list, and I can access over thousand of videos. Also, your bypassing authorization script works perfectly. Thank you so much for your help. – Johnny Nov 25 '20 at 23:31
  • @Johnny Tan: The step 2 above is not about using `youtube.videos().list()` , but `youtube.playlistItems().list(...)`, right? I'm sorry for misleading you: indeed I should have used [`playlistId`](https://developers.google.com/youtube/v3/docs/playlistItems/list#playlistId) and not [`id`](https://developers.google.com/youtube/v3/docs/playlistItems/list#id). This is unfortunate, because I myself [have been using `playlistId` since quite a long time](https://github.com/stvar/youtube-data). I've corrected my answer above. Again, please excuse me for causing you so much trouble! – stvar Nov 26 '20 at 09:23
  • No worry, you didn't cause me trouble. I am new to Google API and OAuth. It is a learning curve for me. I am really appreciate your help. Without your help, I am going nowhere. I do learn a lot from you. Thanks! – Johnny Nov 26 '20 at 18:54
  • Hi, in your codes for bypassing the OAuth 2.0 [link](https://stackoverflow.com/a/64719550/8327971) How to do I access OAuth through proxy server? – Johnny Feb 02 '21 at 02:51
  • @Johnny Tan: Have you encountered issues running that code through a proxy server? – stvar Feb 02 '21 at 11:11
  • Yes, I did. If I run outside the proxy server or VPN, it works fine. I tried setting http_proxy at command line, but it doesn't work. I also read up the httplib2 information, but I don't know how to integrate into your code. Also, I am not sure httplib2 will work. Thanks! – Johnny Tan – Johnny Feb 02 '21 at 18:37
  • So your own browser is set up such that to access the Internet via proxy? (All YouTube Data API calls -- including the ones involved in OAuth 2 flows -- boil down to HTTP requests.) – stvar Feb 02 '21 at 18:45
  • But, since OAuth flows are dependent on browser, you'll have to have your browser set up such that it accesses the Internet OK. – stvar Feb 02 '21 at 19:00
  • I understand now. I will try setting it up on my browser and will let you know if I encounter any other issue. Thanks for the help. – Johnny Feb 02 '21 at 19:07
  • What is the purpose of the line `assert len(items) <= 50`? It seems unnecessary to me. – audiomason Feb 24 '22 at 01:33
  • @audiomason: Well, I do not remember why I particularly added that line of code. Looking into the code itself and, also, into the [history of this post](https://stackoverflow.com/posts/64720048/revisions), I found no specific reason for why that assert is *necessary*. Therefore, I deem it as superfluous indeed. – stvar Feb 24 '22 at 12:54
  • Very useful post. Thanks! I get a key error 'contentDetails' for the code ```uploads_id = response \ ['contentDetails'] \ ['relatedPlaylists'] \ ['uploads'] ``` in step one. Trying to figure out a workaround for nested data – Simone Apr 27 '22 at 14:06