5

Please let me know how to get youtube ID without going to regular expression?

Using above method following URL, didn't work

http://www.youtube.com/e/dQw4w9WgXcQ

http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ

public static String extractYTId(String youtubeUrl) {
    String video_id = "";

    try {
        if(youtubeUrl != null && youtubeUrl.trim().length() > 0 && youtubeUrl.startsWith("http")) {
            String expression = "^.*((youtu.be" + "\\/)" + "|(v\\/)|(\\/u\\/w\\/)|(embed\\/)|(watch\\?))\\??v?=?([^#\\&\\?]*).*"; // var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
            //String expression = "^.*(?:youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*";
            CharSequence input = youtubeUrl;
            Pattern pattern = Pattern.compile(expression, Pattern.CASE_INSENSITIVE);
            Matcher matcher = pattern.matcher(input);
            if(matcher.matches()) {
                String groupIndex1 = matcher.group(7);
                if(groupIndex1 != null && groupIndex1.length() == 11)
                    video_id = groupIndex1;
            }
        }
    } catch(Exception e) {
        Log.e("YoutubeActivity", "extractYTId " + e.getMessage());
    }

    return video_id;
}

Other links working fine

http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0

​​http://www.youtube.com/embed/0zM3nApSvMg?rel=0

http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index

http://www.youtube.com/watch?v=0zM3nApSvMg

http://youtu.be/0zM3nApSvMg

http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s

http://youtu.be/dQw4w9WgXcQ

http://www.youtube.com/embed/dQw4w9WgXcQ

http://www.youtube.com/v/dQw4w9WgXcQ

http://www.youtube.com/watch?v=dQw4w9WgXcQ

​​​​http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0

Piraba
  • 6,974
  • 17
  • 85
  • 135

3 Answers3

4

You can use following RegEx

^(?:(?:https?:\/\/)?(?:www\.)?)?(youtube(?:-nocookie)?\.com|youtu\.be)\/.*?(?:embed|e|v|watch\?.*?v=)?\/?([a-z0-9]+)

RegEx Breakup:

  1. ^: Start of the line anchor
  2. (?:(?:https?:\/\/)?(?:www\.)?)?:
    • (?:https?:\/\/)?: Match http:// or https:// optionally
    • (?:www\.)?)?: Match www. zero or one time
  3. (youtube(?:-nocookie)?\.com|youtu\.be)\/: Match either
    • youtube.com or youtube-nocookie.com or youtu.be followed by /
  4. .*?: Lazy match. Match until the next pattern satisfies.
  5. (?:embed|e|v|watch\?.*?v=)?\/?:
    • (?:embed|e|v|watch\?.*?v=)?: Match embed or e or v or from watch? to v= or nothing
    • \/?: Match / zero or one time
  6. ([a-z0-9]+): Match one or more alphanumeric characters and add that in the captured group.

Live DemoUsing JavaScript

var regex = /^(?:(?:https?:\/\/)?(?:www\.)?)?(youtube(?:-nocookie)?\.com|youtu\.be)\/.*?(?:embed|e|v|watch\?.*?v=)?\/?([a-z0-9]+)/i;

// An array of all the youtube URLs
var youtubeLinks = [
    'http://www.youtube.com/e/dQw4w9WgXcQ',
    'http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ',
    'http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0',
    'http://www.youtube.com/embed/0zM3nApSvMg?rel=0',
    'http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index',
    'http://www.youtube.com/watch?v=0zM3nApSvMg',
    'http://youtu.be/0zM3nApSvMg',
    'http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s',
    'http://youtu.be/dQw4w9WgXcQ',
    'http://www.youtube.com/embed/dQw4w9WgXcQ',
    'http://www.youtube.com/v/dQw4w9WgXcQ',
    'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
    'http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0'
];

// An object to store the results
var youtubeIds = {};

// Iterate over the youtube URLs
youtubeLinks.forEach(function(url) {
    // Get the value of second captured group to extract youtube ID
    var id = "<span class='youtubeId'>" + (url.match(regex) || [0, 0, 'No ID present'])[2] + "</span>";

    // Add the URL and the extracted ID in the result object
    youtubeIds[url] = id;
});

// Log the object in the browser console
console.log(youtubeIds);

// To show the result on the page
document.getElementById('output').innerHTML = JSON.stringify(youtubeIds, 0, 4);
.youtubeId {
    color: green;
    font-weight: bold;
}
<pre id="output"></pre>

RegEx Visualization Diagram

m4n0
  • 29,823
  • 27
  • 76
  • 89
Tushar
  • 85,780
  • 21
  • 159
  • 179
  • @Piraba I think you need to double the backslashes when adding the regex as string. – Tushar Feb 17 '16 at 08:19
  • I added backslash `String expression = "/^(?:(?:https?:\\/\\/)?(?:www\\.)?)?(youtube(?:-nocookie)?\\.com|youtu\\.be)\\/.*?(?:embed|e|v|watch\\?.*?v=)?\\/?([a-z0-9]+)/i";`. Not working – Piraba Feb 17 '16 at 08:31
  • 1
    @Piraba You need to use `if(matcher.find())` instead of `if(matcher.matches())` and print the group 2. Sample: `if (matcher.find()) { video_id = matcher.group(2); }` – Tunaki Feb 17 '16 at 14:40
  • @Tushar - That "flow-illustration" at the bottom looks generated.. how, where ?!? – T4NK3R Feb 27 '16 at 19:44
  • @T4NK3R [regexper.com](http://regexper.com/#%2F%5E(%3F%3A(%3F%3Ahttps%3F%3A%5C%2F%5C%2F)%3F(%3F%3Awww%5C.)%3F)%3F(youtube(%3F%3A-nocookie)%3F%5C.com%7Cyoutu%5C.be)%5C%2F.*%3F(%3F%3Aembed%7Ce%7Cv%7Cwatch%5C%3F.*%3Fv%3D)%3F%5C%2F%3F(%5Ba-z0-9%5D%2B)%2Fi) is an example. The above is generated by Atom editor with `regex-railroad-diagram` package. – Tushar Feb 28 '16 at 05:24
  • The drawback with this answer is if an unknown url pattern arises the method won't be able to extract the videoId. The potential impact may be adressed with the following method: http://stackoverflow.com/a/39742707/363573. – Stephan Sep 28 '16 at 09:08
  • I found that this solution worked when other url parameters like time_continue were included in the string that the other RegExp's in this thread didn't catch. – kylegill Apr 03 '18 at 20:05
1

Your regex is designed for youtu.be domain, of course it doesn't work with youtube.com one.

  1. Construct java.net.URL (https://docs.oracle.com/javase/7/docs/api/java/net/URL.html) from your URL string
  2. Use URL#getQuery() to get the query part
  3. Check Parse a URI String into Name-Value Collection for a ways to decode query part into a name-value map, and get value for name 'v'
  4. If there is no 'query' part (like in http://www.youtube.com/e/dQw4w9WgXcQ), then use URL#getPath() (which will give you /e/dQw4w9WgXcQ) and parse your video ID from it, e. g., by skipping first 3 symbols: url.getPath().substring(3)

Update. Why not regex? Because standard JDK URL parser is much more robust. It is being tested by the whole Java community, while RegExp-based reinvented wheel is only tested by your own code.

Kirill Gamazkov
  • 3,277
  • 1
  • 18
  • 22
0

I like to use this function for all YouTube video ids. I pass through the url and return only the id. Check the fiddle below.

 var ytSrc = function( url ){
    var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
    var match = url.match(regExp);
    if (match&&match[7].length==11){
        return match[7];
    }else{
     alert("Url incorrecta");
    }

}

https://jsfiddle.net/keinchy/tL4thwd7/1/

kylegill
  • 314
  • 1
  • 12
Frederick Jaime
  • 169
  • 2
  • 6