0

I have a data here

<div class="main-details mt10">
    <div class="container">
        <div class="row">
            <div class="col-lg-8 col-md-7" data-purpose="introduction">
                                    <div class="slp-jwplayer-communicator" data-fade-in="1"
                         data-playerhtml='            <iframe id="hh"
                    src="https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true"
                    width="100%"
                    height="100%"
                    frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen
                    style="background: black;">
            </iframe>
        '>
                        <div class="promo-asset-content stretchy-wrapper ud-courseimpressiontracker"
                             data-id="erew343423"
                             data-tracking-type="proms"
                            >
                            <div>
                                <img class="cth" src="https://lcoalhost/data/469292_6c3e_5.jpg"/>
                            </div>
                        </div>
                    </div>
                            </div>
            <div class="col-lg-4 col-md-5">
                <div class="row fxdc lf-wrap-md">
                    <div class="fxw-md -md db-xs">
                        <div class="right-top col-md-12 col-sm-6">

<div class="take-btn">
            <div class="price fxac">

                    </div>

            <a class="ct "
       data-requireLogin="true"
       data-les="button-enroll-b"
       data-padding="0"
       data-passDtCode="true"
       data-purpose="take-this"
       href="https://localhost/code=kKp5D213TWOo">
        Take </a>

I want to find jwplayer and get everything in between src

jwplayer-communicator" data-fade-in="1"
data-playerhtml=' <iframe id="4222780"
src="https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true"

result:

https://localhost/embed/video/E0cZc345xCVTXwT/?params%5Bvars%5D%5Bplaylist%5D%5B0%5D%5Bimage%5D=https%3A%2F%2Flocalhost.images.com%2Fckxit%2F750x422%2F469292_6c3e_5.jpg&params%5BtrackVideoPlay%5D=true

However, the code below will return everything from jwplayer and text beyond the result.

data = re.search(r'jwplayer.*src=\"(.*?)\"', html, re.MULTILINE | re.DOTALL).group(1)

How can I just get everything in between src=" and " provided that it's right after jwplayer?

edit

ok I got it. html parser is better suit to work with this type of problem (html). But let's say I'm just curious as how to perform such action in regex, can anyone please help me? The information is helpful who knows I might encounter such problem in a text file in the future. Moreover, even if I use html parser, I need to pass some regex not matter what.

momokjaaaaa
  • 1,293
  • 3
  • 17
  • 32
  • 6
    I recommend you using parser instead. – Maroun Oct 30 '15 at 11:11
  • 6
    Why are you trying to parse HTML with regex? – jonrsharpe Oct 30 '15 at 11:15
  • I just want to find a link. it's from a session – momokjaaaaa Oct 30 '15 at 11:16
  • What the previous two comments mean is that regex is not the way to go. You should read [this](http://stackoverflow.com/a/1732454/3100115). Consider using [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/#) or [lxml](http://lxml.de/) – styvane Oct 30 '15 at 11:19
  • but I have to use regex as well to find jwplayer and src, ain't I? – momokjaaaaa Oct 30 '15 at 11:22
  • I'd rather use `lxml` and `xpath` to accomplish that. –  Oct 30 '15 at 11:29
  • I believe you don't know how to start here: the first thing you need to do is parse the information, looking for the tag, containing "jwplayer" information. This information will consist of different lines. Then you can use regular expressions for finding the line containing the "src" and the corresponding values. – Dominique Oct 30 '15 at 12:22
  • @user3100115 I'd upvote that if you put it in an answer (and messaged me to remind me.) – Jonathan Mee Oct 30 '15 at 13:31
  • Thank you everyone. I got it html parser is better suited with this problem. But just how can I do that using regex? see my updated question – momokjaaaaa Oct 30 '15 at 14:17
  • Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – AlG Oct 30 '15 at 14:30

1 Answers1

0

just add a "?" after ".*" to made it not so greedy

r'jwplayer.*?src=\"(.*?)\"'