14

As of 4 days ago, you were able to send a GET request to or visit https://video.google.com/timedtext?lang=en&v={youtubeVideoId} and receive an xml response containing the caption track of a given youtube video. Does anyone know if this support has been removed, because as of tonight, it no longer provides the xml response with the captions, the page is simply empty for every video. There were numerous videos this worked for 4 days ago that no longer work. Thanks in advance

Dillon Duff
  • 141
  • 1
  • 4
  • see related (but old) issue https://issuetracker.google.com/issues/170235670 – Kos Nov 12 '21 at 17:46
  • Let me add that this did not require access to the API whatsoever; no API key needed, the site at the URL was the xml file regardless of how you accessed it – Dillon Duff Nov 16 '21 at 05:11
  • 1
    See this issue on Google tracker: https://issuetracker.google.com/issues/207527674 – Kos Dec 13 '21 at 14:05
  • why do sites like youglish still work? How did they get captions from youtube videos? – Nam Lee Sep 25 '22 at 04:29

5 Answers5

10

Captions in default language (single available or English it seems):

To get captions of a YouTube video just use this Linux command (using curl and base64):

curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.9999099\"}},\"params\":\"$(printf '\n\x0bVIDEO_ID' | base64)\"}"

Change the VIDEO_ID parameter with the one interesting you.

Note: the key isn't a YouTube Data API v3 one, it is the first public (tested on some computers in different countries) one coming if you curl https://www.youtube.com/ | grep AIzaSy

Note: If interested in how I reverse-engineered this YouTube feature, say it in the comments and I would write a paragraph to explain

Captions in desired language if available:

YouTube made things tricky maybe to lose you at this step, so follow me: the only thing we have to change is the params value which is base64 encoded data which is in addition to weird characters also containing base64 data which also contains weird characters.

  1. Get the language initials like "ru" for russian
  2. Encode \n\x00\x12\x02LANGUAGE_INITIALS\x1a\x00 in base64 with for instance A=$(printf '\n\x00\x12\x02LANGUAGE_INITIALS\x1a\x00' | base64) (don't forget to change LANGUAGE_INITIALS to your language initials wanted ru for instance). The result for ru is CgASAnJ1GgA=
  3. Encode the result as a URL by replacing the = to %3D with for instance B=$(printf %s $A | jq -sRr @uri). The result for ru is CgASAnJ1GgA%3D
  4. Only if using shell commands: replace the single % to two % with for instance C=$(echo $B | sed 's/%/%%/'). The result for ru is CgASAnJ1GgA%%3D
  5. Encode \n\x0bVIDEO_ID\x12\x0e$C (don't forget to change VIDEO_ID to your video id, with $C the result of the previous step) with for instance D=$(printf "\n\x0bVIDEO_ID\x12\x0e$C" | base64). The result for ru and lo0X2ZdElQ4 is CgtsbzBYMlpkRWxRNBIOQ2dBU0FuSjFHZ0ElM0Q=
  6. Use this params value from the Captions in default language section: curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"$D\"}"
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
  • tried the curl command but got an error: 405. That’s an error. The request method GET is inappropriate for the URL /youtubei/v1/get_transcript. That’s all we know. – user3499381 Dec 14 '21 at 03:35
  • 1
    When I wrote it, it used to works, testing again now I have the same result as you but now testing again I don't have the bug anymore even with Tor: `torsocks curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -X POST -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"$(printf '\n\x0bWdy4YBULvdo' | base64)\"}"` may you confirm ? – Benjamin Loison Dec 14 '21 at 16:05
  • that seems to work. I'll need to figure out how to adapt it to my application. looks like all I have to do is replace it with my key, but not sure where I insert my videoID. is it "x0bWdy4YBULvdo"? – user3499381 Dec 14 '21 at 19:30
  • i get an error when i replace my key with yours: "message": "YouTube Internal API (InnerTube) has not been used in project 914232051599 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/youtubei.googleapis.com/overview?project=914232051599 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.", – user3499381 Dec 14 '21 at 19:55
  • As specified in my original answer the video id I used was "Wdy4YBULvdo" and as specified in my original answer, the key used isn't a YouTube Data API v3 so don't try to change it and if you encounter problem with it in the future check in the original answer how I got it. – Benjamin Loison Dec 14 '21 at 20:06
  • sorry i missed that part when i originally read it. – user3499381 Dec 15 '21 at 02:18
  • I think google wants you to register your website with them and that somehow authorizes the query. unfortunately their documentation is pretty bad and I can't seem to get this to work at all. – user3499381 Dec 30 '21 at 20:08
  • On my side, my shell command is still working. Could you precise the troubles you are facing ? No question of website or registration here as far as I understand what I am doing. And see my `Note:` if the command isn't working because of the key. – Benjamin Loison Dec 30 '21 at 23:25
  • This is so brilliant as it is still working to these days. May I ask, how to you know this trick? Is the API Key is yours? – Dương Tùng Anh Jun 09 '22 at 09:51
  • 1
    It's not a YouTube API key, it's the YouTube UI key. In fact I just used the Network tab from my web-browser and loaded captions on a video and reverse-engineered how the requests were working. – Benjamin Loison Jun 09 '22 at 11:15
  • Does this still work or what I am doing wrong? I get 400 Bad Request with Win10 cmd. curl -v "https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8" -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"CgtsbzBYMlpkRWxRNBIOQ2dBU0FtVnVHZ0ElM0Q=\"}" – John_Sheares Mar 08 '23 at 21:05
  • [@John_Sheares](https://stackoverflow.com/users/313978/john-sheares) On Linux Mint 21.1 (curl 7.81.0) I executed the command you shared (without the `-v`) and [got a response with the wanted subtitles](https://gitlab.com/-/snippets/2510752). I would recommend in fact to use `youtube-dl` or `yt-dlp` for captions retrieving as they propose this feature and are well established softwares. – Benjamin Loison Mar 08 '23 at 21:13
  • Got it working with the Windows cmd line. Had to format the -H option by changing the single quotes to double quotes. – John_Sheares Mar 09 '23 at 07:03
  • Working in Windows Powershell with the following: $base64 = [Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes("`n`ID HERE")) $headers = @{ "Content-Type" = "application/json" } $body = @{ "context" = @{ "client" = @{ "clientName" = "WEB" "clientVersion" = "2.9999099" } } "params" = $base64 } | ConvertTo-Json $response = Invoke-WebRequest 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -Method Post -Headers $headers -Body $body Write-Output $response.Content – Zach Johnson May 31 '23 at 20:37
2

I recommend that anyone who uses python to try the module youtube_transcript_api. I used to send GET request to https://video.google.com/timedtext?lang=en&v={videoId}, but now the page is blank. The following is the code example. In addition, this method does not need api key.

from youtube_transcript_api import YouTubeTranscriptApi
srt = YouTubeTranscriptApi.get_transcript("videoId",languages=['en'])
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Fly your ideas
  • 531
  • 1
  • 4
  • 4
2

The YouTube API change around captions caused me a lot of hassle, which I circumvented through use of youtube-dl, which has won GitHub legal support and is now again available for download/clone.

The software is available as source or binary download for all major platforms, details on their GitHub page, linked above.

Sample use is this simple:

youtube-dl --write-sub --sub-lang en --skip-download --sub-format vtt https://www.youtube.com/watch?v=E-lZ8lCG7WY
Lee Goddard
  • 10,680
  • 4
  • 46
  • 63
0

Old API currently returns 404 on every request. And YouTube right now uses new version of this API:

https://www.youtube.com/api/timedtext?v={youtubeVideoId}&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xftt%2Cxctw&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1637102374&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=0BEBD68A2638D8A18A5BC78E1851D28300247F93.7D5E6D26397D8E8A93F65CCA97260D090C870462&key=yt8&kind=asr&lang=en&fmt=json3

The main problem with this API is to calculate the signature field of request. Unfortunately I couldn't find its algorithm. Maybe someone can reverse engineered it form YouTube player.

Alexander Ushakov
  • 5,139
  • 3
  • 27
  • 50
0

This is a working Python implementation of the CURL answer provided by Benjamin Loison. Replace "vZhT6BeHNmvo" with your video ID.

import base64
import json
import requests

base64_string = base64.b64encode("\n\vZhT6BeHNmvo".encode("utf-8")).decode("utf-8")

headers = {
    "Content-Type": "application/json",
}

body = json.dumps(
    {
        "context": {"client": {"clientName": "WEB", "clientVersion": "2.9999099"}},
        "params": base64_string,
    }
)

response = requests.post(
    "https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8",
    headers=headers,
    data=body,
)

print(response.text)
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Zach Johnson
  • 2,047
  • 6
  • 24
  • 40