29

I have started using v3 of the YouTube apis on an android device, using the java client library. Some videos that I am interested in have transcripts that I can access on the web interface (like educational videos). Is there a way to access the transcripts, if present, using the v3 apis?

Thanks

Ali Naddaf
  • 368
  • 1
  • 3
  • 6

5 Answers5

43

I had the same problem with this... and spent like a week looking for a solution until I hit this:

https://stackoverflow.com/questions/10036796/how-to-extract-subtitles-from-youtube-videos

Just do a GET request on: http://video.google.com/timedtext?lang={LANG}&v={VIDEOID} You don't need any api/oauth/etc. to access this.

Community
  • 1
  • 1
Dev Doomari
  • 943
  • 10
  • 19
  • Do you have any examples of this working? I tried a couple video id's along with every variation of language code I could think of and didn't get a reply. – streetlogics Apr 02 '14 at 04:31
  • 3
    Worked fine for me. Excelent job! – Fernando Freitas Alves May 31 '14 at 04:41
  • 15
    this seems to work when the captions were entered manually, as opposed to be created automagically by youtube. – popstack May 27 '15 at 14:04
  • just add "track=asr" to your url if you want to get the the auto-generated ones – Daniel Vygolov Sep 27 '17 at 07:51
  • 2
    +1 for the information but it is not working for me too, I have uploaded video and generated captions automatically `http://video.google.com/timedtext?lang=en&v=h2pWeot3MJY&track=asr` but not getting caption from the video and it does not seems possible. – manish1706 Jan 02 '18 at 08:46
  • 16
    None of the solutions I could find allowed me to retrieve automatically generated subtitles. Therefore I came up with a bit more complex solution. Code can be found on my GitHub if anyone is still interested: https://github.com/jdepoix/youtube-transcript-api – jdepoix Apr 20 '18 at 14:02
  • The list of tracks can be found with `http://video.google.com/timedtext?type=list&v={video_id}`. (Source: https://stackoverflow.com/a/25888118/5445670.) – Solomon Ucko Mar 08 '19 at 13:54
  • 1
    Interesting case: I used the option @SolomonUcko mentioned (`?type=list`) to get the track list for this video: `CWlbjXwUMJI`, but then when trying to retrieve the track with `?lang=en&v=CWlbjXwUMJI` I just get a blank screen. YouTube error or is it selectively protecting some captions for some videos? – Chris Dec 15 '19 at 10:23
  • 1
    Currently, it's not working. – Nam Lee Sep 29 '22 at 15:35
  • 3
    The post you linked to is no longer available – katiedev Feb 02 '23 at 05:54
13

With API v3 you can first grab the available transcripts with the snippet:

https://www.googleapis.com/youtube/v3/captions?videoId=U1e2VNtEqm4&part=snippet&key=(my_api_key):

{
 "kind": "youtube#captionListResponse",
 "etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/aGHflncRxq1Uz6m1akhrOLUWUqU\"",
 "items": [
  {
   "kind": "youtube#caption",
   "etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/IC7rNKkn3SQNdovFwR6fEabUYnY\"",
   "id": "TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=",
   "snippet": {
    "videoId": "U1e2VNtEqm4",
    "lastUpdated": "2016-01-25T21:50:27.142Z",
    "trackKind": "standard",
    "language": "en-GB",
    "name": "",
    "audioTrackType": "unknown",
    "isCC": false,
    "isLarge": false,
    "isEasyReader": false,
    "isDraft": false,
    "isAutoSynced": false,
    "status": "serving"
   }
  },
  {
   "kind": "youtube#caption",
   "etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/5UP1qPkmq6mzTUaEVnFC8WqjFgU\"",
   "id": "TqXDnlamg84o4bX0q2oaHw_Y53ilUWv6vMFbk0RL3XY=",
   "snippet": {
    "videoId": "U1e2VNtEqm4",
    "lastUpdated": "2016-01-25T21:55:07.481Z",
    "trackKind": "standard",
    "language": "en-US",
    "name": "",
    "audioTrackType": "unknown",
    "isCC": false,
    "isLarge": false,
    "isEasyReader": false,
    "isDraft": false,
    "isAutoSynced": false,
    "status": "serving"
   }
  }
 ]
}

And then pick the transcript you want:

https://www.googleapis.com/youtube/v3/captions/id?id=TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=

or

https://www.googleapis.com/youtube/v3/captions/TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=

at which point you need provide an authorization key. Apparently a simple key isn't enough. Possibly because:

Quota impact: A call to this method has a quota cost of approximately 200 units.

Note the slight difference in the URLs (/caption/ versus /caption?).

All the lovely documentation is here: https://developers.google.com/youtube/v3/docs/captions

Ken Sharp
  • 934
  • 9
  • 22
  • 3
    can't pick the transcript using id, showing error 404 "The caption track could not be found. Check the value of the requests id parameter to ensure that it is correct." – Abhishek Ramachandran Oct 20 '16 at 04:15
  • 1
    `pick the transcript you want` - this part don't work with an API Key, gives error HTTP 401, Login Required.: "API keys are not supported by this API. Expected OAuth2 access token or other authentication credentials that assert a principal. See https://cloud.google.com/docs/authentication" – Kos Nov 17 '21 at 10:24
8

I may be wrong, but I don't think there is yet a documented way to get the caption track via v3 of the API. If you're authenticating with oAuth2, however, your authentication will also be good for v2 of the API, so you could do a quick call to this feed:

http://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captiondata/[CAPTION TRACKID]

to get the data you want. To retrieve a list of possible caption track IDs with v2 of the API, you access this feed:

https://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captions

That feed request also accepts some optional parameters, including language, max-results, etc. For more details, along with a sample that shows the returned format of the caption track list, see the documentation at https://developers.google.com/youtube/2.0/developers_guide_protocol_captions#Retrieve_Caption_Set

jlmcdonald
  • 13,408
  • 2
  • 54
  • 64
6

Heres some code I wrote which grabs all the caption tracks from any youtube video without having to use the API. Just plug the video URL in the $video_url variable.

// get video id from url
$video_url = 'https://www.youtube.com/watch?v=kYX87kkyubk';
preg_match("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\n]+#", $video_url, $matches);

// get video info from id
$video_id = $matches[0];
$video_info = file_get_contents('http://www.youtube.com/get_video_info?&video_id='.$video_id);
parse_str($video_info, $video_info_array);

if (isset($video_info_array['caption_tracks'])) {
    $tracks = explode(',', $video_info_array['caption_tracks']);

    // print info for each track (including url to track content)
    foreach ($tracks as $track) {
        parse_str($track, $output);
        print_r($output);
    }
}
kjdion84
  • 9,552
  • 8
  • 60
  • 87
  • 1
    I have create a code based on your logic of caption trackas, if you want we can add more support together , I have added translation details for example. https://github.com/jamesjara/php-transcript-youtube-api-and-xml-parser/blob/master/example/test.php – jamesjara Jan 04 '17 at 08:51
  • Seems to not work on automatically transcripted videos... :( ! So probably the way is using the Youtube API 3 with key – gtamborero Jan 31 '21 at 00:16
6

Probably the best way is using Youtube API 3. I'm trying it but you need an API key + OAuth 2.0 user. A fast solution is using captionsgrabber and parsing the returned HTML data.

Use example:

https://www.captionsgrabber.com/8302/get-captions.00.php?id=UJTY7ilwSq4

// Where the id is the youtube video id

gtamborero
  • 2,898
  • 27
  • 28