2

I want to get all .mp4 URLs of this String using Regex.

Also I want to know how to get only the last .mp4 URL using Regex.

Thanks

contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8}, 

Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd}, 

Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4}, 

Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4}, 

Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]";
user7453632
  • 35
  • 1
  • 5

2 Answers2

4

Regex:

https?.*?\.mp4

Literal http

Followed by an optional 's': s?

Remove the question mark if they will all use HTTPS.

Followed by as few characters as possible: .*?

Followed by an mp4 extension (literal dot) \.mp4

Michael
  • 41,989
  • 11
  • 82
  • 128
  • 1
    *This should be the accepted answer!* However, it would be helpful if you could provide some more information on how to exclude some Strings (like `jpg`) within the URL. i.e. I don’t want to match `` – ixany Oct 01 '18 at 08:56
0

2 Approaches:

  1. If you're sure the URL's will always begin with https:// and will not contain a mp4 after the complete URL is finished, then you can use pattern = "https://.*mp4";

    String[] arr = {
        "contentType=application/x-mpegURL, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.m3u8}",
    
        "Variant{bitrate=0, contentType=application/dash+xml, url=https://video.twimg.com/amplify_video/822938952332144642/pl/BjHU8aBCbOgZNzXQ.mpd}",
    
        "Variant{bitrate=320000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4}",
    
        "Variant{bitrate=832000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4}",
    
        "Variant{bitrate=2176000, contentType=video/mp4, url=https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4}]}]" 
    };
    String pattern = "https://.*mp4";
    Pattern r = Pattern.compile(pattern);
    
    for (String line : arr) {
        Matcher m = r.matcher(line);
        if (m.find()) {
            System.out.println(m.group(0));
        } else {
            System.out.println("NO MATCH");
        }
    }
    
  2. If not, to Support all types of URL's then change your pattern to what is defined here with a little modification,

    String pattern = 
        "(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" + 
        "(\\w+:\\w+@)?(([-\\w]+\\.)+(com|org|net|gov" + 
        "|mil|biz|info|mobi|name|aero|jobs|museum" + 
        "|travel|[a-z]{2}))(:[\\d]{1,5})?" + 
        "(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" + 
        "((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" + 
        "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" + 
        "(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" + 
        "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" + 
        "(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b"+"mp4";
    

Output:

NO MATCH
NO MATCH
https://video.twimg.com/amplify_video/822938952332144642/vid/320x180/YqZ72rzLj3VWVhy4.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/640x360/A2vMgzo2ElpPP6TE.mp4
https://video.twimg.com/amplify_video/822938952332144642/vid/1280x720/j9xbNzRZqEbYs_2s.mp4
Community
  • 1
  • 1
Kishore Bandi
  • 5,537
  • 2
  • 31
  • 52
  • The first approache will not work since my urls are in one single line string not in new lines or arrays. but the second approache works perfect thanks – user7453632 Jan 22 '17 at 20:55