Parsing all possible YouTube urls

Question

I am looking for all the features that a YouTube url can have?

http://www.youtube.com/watch?v=6FWUjJF1ai0&feature=related

So far I have seen feature=relmfu, related, fvst, fvwrel. Is there a list for this somewhere. Also, my ultimate aim is to extract the video id (6FWUjJF1ai) from all possible youtube urls. How can I do that? It seems to be difficult. Is there anyone who has already done that?

Who cares what features it can have? Youtube doesn't actually support the `fried_spam` feature, but if I pass you a link like http://www.youtube.com/watch?v=6FWUjJF1ai0&feature=fried_spam, it works fine, and there's no reason you shouldn't extract `6FWUjJF1ai` from that. — Karl Knechtel, Jan 11 '12 at 04:43
@Karl:How do I write the regex expression to extract vid from that url? — Bruce, Jan 11 '12 at 04:51

score 6 · Accepted Answer · answered Jan 11 '12 at 04:50

6

You can use urlparse to get the query string from your url, then you can use parse_qs to get the video id from the query string.

answered Jan 11 '12 at 04:50

Frank

76
2

score 3 · Answer 2 · answered Jan 11 '12 at 08:48

3

wrote the code for your assistance....the credit of solving is purely Frank's though.

import urlparse as ups
m = ups.urlparse('http://www.youtube.com/watch?v=6FWUjJF1ai0&feature=related')
print ups.parse_qs(m.query)['v']

answered Jan 11 '12 at 08:48

Arnab Ghosal

483
1
4
11

score 0 · Answer 3 · answered Sep 23 '17 at 02:59

From the following answer https://stackoverflow.com/a/43490746/8534966, I ran 55 different test cases and it was able to get 51 matches. See my tests.

So I wrote some if else code to fix it:

# Get YouTube video ID
if "watch%3Fv%3D" in youtube_url:
    # e.g.: https://www.youtube.com/attribution_link?a=8g8kPrPIi-ecwIsS&u=/watch%3Fv%3DyZv2daTWRZU%26feature%3Dem-uploademail
    search_pattern = re.search("watch%3Fv%3D(.*?)%", youtube_url)
    if search_pattern:
        youtube_id = search_pattern.group(1)
elif "watch?v%3D" in youtube_url:
    # e.g.: http://www.youtube.com/attribution_link?a=JdfC0C9V6ZI&u=%2Fwatch%3Fv%3DEhxJLojIE_o%26feature%3Dshare
    search_pattern = re.search("v%3D(.*?)&format", youtube_url)
    if search_pattern:
        youtube_id = search_pattern.group(1)
elif "/e/" in youtube_url:
    # e.g.: http://www.youtube.com/e/dQw4w9WgXcQ
    youtube_url += " "
    search_pattern = re.search("/e/(.*?) ", youtube_url)
    if search_pattern:
        youtube_id = search_pattern.group(1)
else:
    # All else.
    search_pattern = re.search("(?:[?&]vi?=|\/embed\/|\/\d\d?\/|\/vi?\/|https?:\/\/(?:www\.)?youtu\.be\/)([^&\n?#]+)",
                               youtube_url)
    if search_pattern:
        youtube_id = search_pattern.group(1)

score 0 · Answer 4 · answered Feb 18 '18 at 23:17

0

You may rather want to consider a wider spectrum of url parser as suggested on this Gist.

It will parse more than what urlparse can do.

answered Feb 18 '18 at 23:17

vinyll

11,017
2
48
37

Parsing all possible YouTube urls

4 Answers4