I have a function that grabs the source code of YouTube clips and then tries to find: startTimeMs, endTimeMs, and videoId.
This is the videoId block:
class className():
def __init__(self, link)
# make the request
self.r = requests.get(link)
def originalVideoID(self):
# get the source code
source = str(self.r.content)
# these are the endpoints in which the videoID is enclosed
start = "\"videoDetails\":{\"videoId\":\""
end = '\"'
# gets everything right of videoDetails
videoID = source.split(start)[1]
# gets everything left of the quote
videoID = videoID.split(end)[0]
Expected Outcome:
If given a YouTube Clip URL like: https://www.youtube.com/clip/UgkxU2HSeGL_NvmDJ-nQJrlLwllwMDBdGZFs,
videoID should consistently be NiXD4xVJM5Y.
Actual Outcome:
- Sometimes, the expected outcome occurs.
- Other times, I get an IndexError from line 15.
When debugging this:
I added start in source
to line 14 which returns False
whenever IndexError is thrown.
I have printed str(self.r.content)
which is where I can see the source code is completely different.
What am I doing wrong? Is this a case to use another tool like selenium or perhaps, I using requests wrong or I should approach this differently?
EDIT: This is the traceback on the error
Traceback (most recent call last):
File "PATHTOPROJECT\FILENAME.py", line 383, in <module>
main()
File "PATHTOPROJECT\FILENAME.py", line 165, in download_video
downloadLink = className(link).originalVideoID()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "PATHTOPROJECT\FILENAME.py", line 67, in originalVideoID
videoID = source.split(start)[1]
~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
The data that I am seeking in the source code is within this script:
<script nonce="601b9hyYx1NEaPf0pQewqA">
var ytInitialPlayerResponse =
{
...
"videoDetails":
{
"videoId":"NiXD4xVJM5Y", ...
},
...
"clipConfig":
{
"postId": ... ,"startTimeMs":"0","endTimeMs":"15000"
}
}