Find regex expression to find link from html

Question

Trying to find link from the following htm data inside hls: with regex.Tried (r"(?<=hls:\s\')(.*)") but it gives partial link https://mvd4.ddns.me:443/1vod5n/almajde-ben-zaher-1 , Any suggestions?

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>RikTak Video Player - Version 1</title>
    <script src="https://cdn.radiantmediatechs.com/rmp/5.2.1/js/rmp.min.js"></script>
    <style>
        body {
            margin: 0;
        }
    </style>
</head>
<body>
<div id="rmpPlayer"></div>
<script>
    var bitrates = {
         hls: 'https://mvd4.ddns.me:443/1vod5n/almajde-ben-zaher-1.mp4/playlist.m3u8?wmsAuthSign=c2VydmVyX3RpbWU9MTAvMjQvMjAxOSA3OjUyOjA2IEFNJmhhc2hfdmFsdWU9WjIxaHNDcTZDMXEzTmM4ZTFTU0RIUT09JnZhbGlkbWludXRlcz02MA=='
    };

        var schedule = {
       preroll: [
            'https://googleads.g.doubleclick.net/pagead/ads?ad_type=video_image&client=ca-video-pub-1231661633440980&description_url=https%3A%2F%2Fwww.farfeshplus.com&channel=7962520214&videoad_start_delay=0&hl=ar'
            ],
        midroll: [

            [600,'https://googleads.g.doubleclick.net/pagead/ads?ad_type=video_text_image&client=ca-video-pub-1231661633440980&description_url=https%3A%2F%2Fwww.farfeshplus.com&channel=7962520214&videoad_start_delay=0&hl=ar'],
            [1200,'https://googleads.g.doubleclick.net/pagead/ads?ad_type=video_text_image&client=ca-video-pub-1231661633440980&description_url=https%3A%2F%2Fwww.farfeshplus.com&channel=7962520214&videoad_start_delay=0&hl=ar'],

            [1800,'https://googleads.g.doubleclick.net/pagead/ads?ad_type=video_text_image&client=ca-video-pub-1231661633440980&description_url=https%3A%2F%2Fwww.farfeshplus.com&channel=7962520214&videoad_start_delay=0&hl=ar']
            ],
        postroll: [
            'https://googleads.g.doubleclick.net/pagead/ads?ad_type=video_text_image&client=ca-video-pub-1231661633440980&description_url=https%3A%2F%2Fwww.farfeshplus.com&channel=7962520214&videoad_start_delay=0&hl=ar'
        ]
    };
        var settings = {
        licenseKey: 'Kl8lNHNrNzkyY3M5dj9yb201ZGFzaXMzMGRiMEElXyo=',
        bitrates: bitrates,
        delayToFade: 3000,
        width: 750,
        height: 440,
        skin: 's4',
        hlsJSMaxBufferSize: 0,
        hlsJSMaxBufferLength: 240,
        poster: 'https://www.farfeshplus.com/ramadanimages/1443.jpg',
        ads: true,
        adSchedule: schedule
    };
    var elementID = 'rmpPlayer';
    var rmp = new RadiantMP(elementID);
    rmp.init(settings);
</script>
</body>
</html>

I think it should work right? https://regex101.com/r/FCQL63/1 — The fourth bird, Oct 24 '19 at 10:57
Maybe you should show the code you tried. As said the previous comment, the regexp works fine. — Amessihel, Oct 24 '19 at 11:01
Possible duplicate of [What is the best regular expression to check if a string is a valid URL?](https://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url) — Kris, Oct 24 '19 at 11:08

score 0 · Answer 1 · answered Oct 24 '19 at 11:01

0

I would use Beautiful Soup to first parse and obtain the content for the <script> tag. Then, use regex to extract the link you want.

from bs4 import BeautifulSoup

soup = BeautifulSoup(page.content, 'html.parser')
script = soup.find_all('script')[0]
m = re.search(r"var bitrates = \{\s+hls: '([^']+)'\s+\};", script)
print(m.group(1))

The problem with using regex alone is that you really need a parser here to handle arbitrarily nested HTML content. Regex was not designed for this task.

answered Oct 24 '19 at 11:01

Tim Biegeleisen

502,043
27
286
360

it prints the remaining link on a new line :( – Ibtsam Ch Oct 24 '19 at 11:38
I want the whole url on same line. What should i do – Ibtsam Ch Oct 24 '19 at 13:04
What output does my script currently give you, and why is it wrong? – Tim Biegeleisen Oct 24 '19 at 13:06
it doesn't give wrong ouput, it just prints the link after "https://mvd4.ddns.me:443/1vod5n/almajde-ben-zaher-1" to a new line. I just the link as a whole on the same line. I was asking what should i do for that? – Ibtsam Ch Oct 24 '19 at 13:10
kindly guide me on how can i do this? – Ibtsam Ch Oct 24 '19 at 13:27

Find regex expression to find link from html

1 Answers1