10

Is it possible to pull the auto (non-user) generated video transcripts from any of the YouTube APIs?

Ted
  • 123
  • 1
  • 2
  • 6
  • did you got any solution till now? – manish1706 Jan 02 '18 at 10:44
  • 3
    @manish1706 None of the solutions I could find allowed me to retrieve automatically generated subtitles. Therefore I implemented an API Client myself, which allows you to get automatically generated subtitles for a language of your choice. Code can be found on my GitHub if anyone is still interested: https://github.com/jdepoix/youtube-transcript-api – jdepoix Feb 27 '19 at 11:01

3 Answers3

6

As of Aug 2019 the following method you to download transcripts:

  1. Open in Browser

https://www.youtube.com/watch?v=[Video ID]

  1. From Console type:
    JSON.parse(ytplayer.config.args.player_response).captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl
mundanelunacy
  • 75
  • 1
  • 3
3

1 Install youtube-transcript-api (https://github.com/jdepoix/youtube-transcript-api), e.g.:

pip3 install youtube_transcript_api

2 Create youtube_transcript_api-wrapper.py with the following code (based partially on https://stackoverflow.com/a/65325576/2585501):

from youtube_transcript_api import YouTubeTranscriptApi

#srt = YouTubeTranscriptApi.get_transcript(video_id)

videoListName = "youtubeVideoIDlist.txt"
with open(videoListName) as f:
    video_ids = f.read().splitlines()

transcript_list, unretrievable_videos = YouTubeTranscriptApi.get_transcripts(video_ids, continue_after_error=True)

for video_id in video_ids:

    if video_id in transcript_list.keys():

        print("\nvideo_id = ", video_id)
        #print(transcript)

        srt = transcript_list.get(video_id)

        text_list = []
        for i in srt:
            text_list.append(i['text'])

        text = ' '.join(text_list)
        print(text)

3 Create youtubeVideoIDlist.txt containing a list of video_ids

4 python3 youtube_transcript_api-wrapper.py

user2585501
  • 596
  • 4
  • 17
2

You may refer with this thread: How to get "transcript" in youtube-api v3

If you're authenticating with oAuth2, you could do a quick call to this feed:

http://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captiondata/[CAPTIONTRACKID]

to get the data you want. To retrieve a list of possible caption track IDs with v2 of the API, you access this feed:

https://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captions

That feed request also accepts some optional parameters, including language, max-results, etc. For more details, along with a sample that shows the returned format of the caption track list, see the documentation at https://developers.google.com/youtube/2.0/developers_guide_protocol_captions#Retrieve_Caption_Set

Also, here are some references which migh help:

Community
  • 1
  • 1
abielita
  • 13,147
  • 2
  • 17
  • 59