58

I am using the following regex for validating youtube video share url's.

var valid = /^(http\:\/\/)?(youtube\.com|youtu\.be)+$/;
alert(valid.test(url));
return false;

I want the regex to support the following URL formats:

http://youtu.be/cCnrX1w5luM  
http://youtube/cCnrX1w5luM  
www.youtube.com/cCnrX1w5luM  
youtube/cCnrX1w5luM  
youtu.be/cCnrX1w5luM   

I tried different regex but I am not getting a suitable one for share links. Can anyone help me to solve this.

starball
  • 20,030
  • 7
  • 43
  • 238
Jenz
  • 8,280
  • 7
  • 44
  • 77

11 Answers11

101

Here's a regex I use to match and capture the important bits of YouTube URLs with video codes:

^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|live\/|v\/)?)([\w\-]+)(\S+)?$

Works with the following URLs:

https://www.youtube.com/watch?v=DFYRQ_zQ-gk&feature=featured
https://www.youtube.com/watch?v=DFYRQ_zQ-gk
http://www.youtube.com/watch?v=DFYRQ_zQ-gk
//www.youtube.com/watch?v=DFYRQ_zQ-gk
www.youtube.com/watch?v=DFYRQ_zQ-gk
https://youtube.com/watch?v=DFYRQ_zQ-gk
http://youtube.com/watch?v=DFYRQ_zQ-gk
//youtube.com/watch?v=DFYRQ_zQ-gk
youtube.com/watch?v=DFYRQ_zQ-gk

https://m.youtube.com/watch?v=DFYRQ_zQ-gk
http://m.youtube.com/watch?v=DFYRQ_zQ-gk
//m.youtube.com/watch?v=DFYRQ_zQ-gk
m.youtube.com/watch?v=DFYRQ_zQ-gk

https://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
http://www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
//www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
www.youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US
youtube.com/v/DFYRQ_zQ-gk?fs=1&hl=en_US

https://www.youtube.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube.com/embed/DFYRQ_zQ-gk
http://www.youtube.com/embed/DFYRQ_zQ-gk
//www.youtube.com/embed/DFYRQ_zQ-gk
www.youtube.com/embed/DFYRQ_zQ-gk
https://youtube.com/embed/DFYRQ_zQ-gk
http://youtube.com/embed/DFYRQ_zQ-gk
//youtube.com/embed/DFYRQ_zQ-gk
youtube.com/embed/DFYRQ_zQ-gk

https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk?autoplay=1
https://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
//www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
www.youtube-nocookie.com/embed/DFYRQ_zQ-gk
https://youtube-nocookie.com/embed/DFYRQ_zQ-gk
http://youtube-nocookie.com/embed/DFYRQ_zQ-gk
//youtube-nocookie.com/embed/DFYRQ_zQ-gk
youtube-nocookie.com/embed/DFYRQ_zQ-gk

https://youtu.be/DFYRQ_zQ-gk?t=120
https://youtu.be/DFYRQ_zQ-gk
http://youtu.be/DFYRQ_zQ-gk
//youtu.be/DFYRQ_zQ-gk
youtu.be/DFYRQ_zQ-gk

https://www.youtube.com/HamdiKickProduction?v=DFYRQ_zQ-gk

https://www.youtube.com/live/sMbxjePPmkw?feature=share

The captured groups are:

  1. protocol
  2. subdomain
  3. domain
  4. path
  5. video code
  6. query string

https://regex101.com/r/vHEc61/1

nezort11
  • 326
  • 1
  • 4
  • 11
phuc77
  • 6,717
  • 2
  • 15
  • 11
  • 6
    https://youtube.com/foo_bar- and https://youtube.com/foo_bar and https://www.youtube.com/watch?v= are not valid YouTube video URLs, but this regex will match them. – Joey Mason Jul 27 '17 at 17:00
  • It doesn't match a valid link like https://www.youtube.com/live/sMbxjePPmkw?feature=share . I have added `|live\/` after `|embed\/` part. **Final regex version**: `^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu.be))(\/(?:[\w\-]+\?v=|embed\/|live\/|v\/)?)([\w\-]+)(\S+)?$` – nezort11 May 21 '23 at 22:59
  • This matches `youtuxbe/-` which is not a valid YouTube URL. Changing the regex to `^((?:https?:)?\/\/)?((?:www|m)\.)?((?:youtube(-nocookie)?\.com|youtu\.be))(\/(?:[\w\-]+\?v=|embed\/|live\/|v\/)?)([\w\-]+)(\S+)?$` fixes this for me. – Ilshidur Aug 31 '23 at 08:29
58
  • You're missing www in your regex
  • The second \. should optional if you want to match both youtu.be and youtube (but I didn't change this since just youtube isn't actually a valid domain - see note below)
  • + in your regex allows for one or more of (youtube\.com|youtu\.be), not one or more wild-cards.
    You need to use a . to indicate a wild-card, and + to indicate you want one or more of them.

Try:

^(https?\:\/\/)?(www\.youtube\.com|youtu\.be)\/.+$

Live demo.

If you want it to match URLs with or without the www., just make it optional:

^(https?\:\/\/)?((www\.)?youtube\.com|youtu\.be)\/.+$

Live demo.

Invalid alternatives:

If you want www.youtu.be/... to also match (at the time of writing, this doesn't appear to be a valid URL format), put the optional www. outside the brackets:

^(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.be)\/.+$

youtube/cCnrX1w5luM (with or without http://) isn't a valid URL, but the question explicitly mentions that the regex should support that. To include this, replace youtu\.be with youtu\.?be in any regex above. Live demo.

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
20

I know I'm like 2 years late to the party, but I was needing to write something up anyway, and seems to fit every test case that I can throw at it. Should be able to reference the first match ($1) to get the ID. Matches the http, https, www and non-www, youtube.com, youtu.be, /watch? and /watch.php? on youtube.com (youtu.be does not use these), and it supports matching even when there are other variables in the URL string (?t= for time, ?list= for playlists, etc).

(?:https?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]+)
mzalazar
  • 6,206
  • 3
  • 34
  • 31
xeon927
  • 313
  • 2
  • 6
  • Any chance you could update this to support https://www.youtube.com/watch/IDHERE, which is valid? – Jacob Morrison Jun 17 '16 at 03:30
  • 4
    @JacobMorrison Another two years late, but what the hell - updated the code :) – xeon927 Oct 13 '18 at 11:56
  • `^(?:https?:)?(?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]{7,15})(?:[\?&][a-zA-Z0-9\_-]+=[a-zA-Z0-9\_-]+)*$` Improved it a bit so it checks entry starts and ends with url, so things like `extra text youtube.com/embed/DFYRQ_zQ-gk extra text` are not valid. Also added validation id is not less than 7 symbols – cuddlemeister Mar 02 '21 at 17:49
12

Format for YouTube videos has changed. This regex works for all cases:

^(http(s)??\:\/\/)?(www\.)?((youtube\.com\/watch\?v=)|(youtu.be\/))([a-zA-Z0-9\-_])+

Tests here.

Joey Mason
  • 707
  • 1
  • 8
  • 15
  • what has changed? phuc77's answer seems better. – Ashish Gupta Jul 27 '17 at 11:32
  • 2
    Not all of these tests will pass using phuc77's answer: https://regex101.com/r/RyE7OM/2/tests. Specifically, https://youtube.com/foo_bar and https://www.youtube.com/watch?v= should not validate. – Joey Mason Jul 27 '17 at 16:56
  • This answer should be used by anyone searching for a solution. It is the best I've found till now. – Naveen Niraula Jun 05 '18 at 08:51
  • 1
    If you wanted to catch the ID, then there's a typo in your regex, the + sign at the end should be before the last parenthesis because otherwise it's going to capture only last letter. The final regex should look like this `^(http(s)??\:\/\/)?(www\.)?((youtube\.com\/watch\?v=)|(youtu.be\/))([a-zA-Z0-9\-_]+)` – Maciej Pk Feb 15 '19 at 09:15
  • The phuc77 seems better, this answer doesn't pass all the test : https://regexr.com/4b2fh – user2226755 Mar 27 '19 at 06:02
5

Based on so many other regex; this is the best I have got:

((http(s)?:\/\/)?)(www\.)?((youtube\.com\/)|(youtu.be\/))[\S]+

Test: http://regexr.com/3bga2

yusuf
  • 3,596
  • 5
  • 34
  • 39
3

Try this:

((http://)?)(www\.)?((youtube\.com/)|(youtu\.be)|(youtube)).+

http://regexr.com?36o7a

Games Brainiac
  • 80,178
  • 33
  • 141
  • 199
  • There are a few unnecessary brackets there - `...(youtube\.com/|youtu.be|youtube).*`, and you probably want to escape the `.` in `youtu.be`, and you may want to put the `/` outside (so it's included for `youtu.be` and `youtube`). – Bernhard Barker Oct 15 '13 at 09:41
3

I took one of the answers from here and added support for a few edge cases that I noticed in my dataset. This should work for pretty much any valid url.

^(?:https?:)?(?:\/\/)?(?:youtu\.be\/|(?:www\.|m\.)?youtube\.com\/(?:watch|v|embed)(?:\.php)?(?:\?.*v=|\/))([a-zA-Z0-9\_-]{7,15})(?:[\?&][a-zA-Z0-9\_-]+=[a-zA-Z0-9\_-]+)*(?:[&\/\#].*)?$

zmanplex
  • 31
  • 1
1

I tried this one and it works fine for me.

(?:http(?:s)?:\/\/)?(?:www\.)?(?:youtu\.be\/|youtube\.com\/(?:(?:watch)?\?(?:.*&)?v(?:i)?=|(?:embed|v|vi|user)\/))([^\?&\"'<> #]+)

You can check here https://regex101.com/r/Kvk0nB/1

Akash Jain
  • 894
  • 2
  • 10
  • 23
1

https://regexr.com/62kgd

^((http|https)\:\/\/)?(www\.youtube\.com|youtu\.?be)\/((watch\?v=)?([a-zA-Z0-9]{11}))(&.*)*$

https://www.youtube.com/watch?v=YPz9zqakRbk

https://www.youtube.com/watch?v=YPz9zqakRbk&t=11

http://youtu.be/cCnrX1w5luM&y=12

http://youtu.be/cCnrX1w5luM

http://youtube/cCnrXswsluM

www.youtube.com/cCnrX1w5luM

youtube/cCnrX1w5luM

0

Modified from phuk using

  • capturing only-token / using non-capturing groups for all but token
  • multi-line with comments /x or here @x x(PCRE_EXTENDED)
  • using @ as delimiters as to be able to use / without escape.
  • non-escape on - at end of character lists.
    E.g. [\w-] not [\w\-]

Example at regex101 with an experimental inclusion of # Possible: oembed?url=...v=:

https://regex101.com/r/0pZCmF/1

$yttok_regex = <<<EOR
@^

# Possible: http://
#       https://
#       //
(?:(?:https?:)?//)?

# Possible: www.
#       m.
(?:(?:www|m)\.)?

# Possible: youtube.com
#       youtube-nocookie.com
#       youtu.be
(?:(?:youtube(?:-nocookie)?\.com|youtu.be))?

# Possible: /[a-zA-Z0-9_-]+?v=
#       /embed/
#       /v/
(?:/(?:[\w-]+\?v=|embed/|v/)?)?

# TOKEN:    [a-zA-Z0-9_-]
([\w-]+)

# Possible:
#       Anything not space+
(?:\S+)?

# EOF pattern with x(PCRE_EXTENDED) flag:
$@x
EOR;

Optionally use:

# TOKEN:    [a-zA-Z0-9_-]
([\w-]{11})

To match only 11-char long tokens.

user3342816
  • 974
  • 10
  • 24
  • (PS: as SO still refuses to honor 8-char wide tab, the lineup is not as nice as locally, - but hey. They also do what ever they can to mangle up the source, - which used to work. But they clearly hate tabs to eternity and back and would like all code to be non-indented. Likely negative indented just to make it even worse. But hey. Why use letters at all in code - perhaps we should start using emoticons instead. (The code would in many cases be more readable)) – user3342816 Apr 04 '23 at 14:02
-5

Check this pattern instead:

r'(?i)(http.//|https.//)*[A-Za-z0-9._%+-]+\.\w+'
rolandvarga
  • 126
  • 1
  • 10