1

I have a string called $code.It holds sets of data as shown . I am trying to parse value of href=,channel=,src=. I tried to use preg match all but i got no data! could any one show me what is the best way to parse the above data ?Thanks in advance.

Value of $code:

        <div class="new"> <a class="block" target="_blank" href="http://somesite:8080/hls/mango1.m3u8?token=34523sedfsdfsdf&e=123456789&channel=mango1" data-toggle="modal" data-target="#mango1">
<div class="image-container"> <img src="images/mango1.png" class="img-responsive" > </div>
</a> </div>

        <div class="new"> <a class="block" target="_blank" href="http://somesite:8080/hls/mango2.m3u8?token=sfaesfraesgh452342&e=987654321&channel=mango2" data-toggle="modal" data-target="#mango2">
<div class="image-container"> <img src="images/mango2.png" class="img-responsive" > </div>
</a> </div>

php code:

preg_match_all("#target=\"_blank\" href=\"([^<]+)\" data-toggl", $code, $foo2);

var_dump($foo2[1]); 
print_r($foo2[1]);

Edit: I tried using DOM i got the value of href but how to get value of src=?

$dom = new DOMDocument;
$dom->loadHTML($code);
$xpath = new DOMXPath($dom);

$nodeList = $xpath->query('//a[@class="block"]');
foreach ($nodeList as $node) {
    $href = $node->getAttribute('href');
    $imageurl = $node->getAttribute('src');

    echo "<br>".$href;
    echo "<br>".$imageurl;

}
user1788736
  • 2,727
  • 20
  • 66
  • 110
  • 1
    You need to use DOMDocument and DOMXPath, search a tutorial about them. You can take a look here: http://www.phptutorial.info/?domxpath.query – Casimir et Hippolyte Jan 10 '16 at 23:09
  • Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – Guildencrantz Jan 10 '16 at 23:16
  • Thanks for replies. Casimir et Hippolyte I edited my first post i used DOM i got the value of href but how to get value of src =? – user1788736 Jan 10 '16 at 23:28
  • In your code you get each "a" nodes that have a class "block" attribute. In the same way you can build an other query *(inside the foreach loop)* that search from each node (see DOMXPath::query in the PHP manual *(the second parameter)*) a descendant img node and get the src attribute. – Casimir et Hippolyte Jan 10 '16 at 23:43
  • 1
    Something like this: https://eval.in/500214 – Casimir et Hippolyte Jan 11 '16 at 00:51

1 Answers1

0

I see that several people have posted in the comments to use the DOM method and that's cool. Unfortunately, I am still learning how to use DOM myself, so I'm not really able to clear up your questions about it. But I can show you how to use preg_match_all to parse your data like you were trying to do in your example.

The REGEX that I came up with was this:

\s*<div class="new">.*?href="((?:.*?)channel=(.*?))".*?src="(.*?)".*?</a>\s*</div>

Here's what it does:

  • \s* - Looking for a whitespace \s, that may be present any number of times *.
  • <div class="new"> - Locate that exact div.
  • .*? - I use this a few times throughout the expression and it simply means grab any character ., any number of times *, until it matches the next part of our expression ?.
  • href=" - This is the next part of the expression. We are literally matching the string href=".
  • ((?:.*?) - The first thing we do is open the parenthesis ( that will capture our full URL. Immediately following that, we start another group that will match anything up until "channel" .*?. I added in a ?: to the front of this group to tell the regex engine not to remember whatever is in this group. (We are going to remember the entire url ... and don't need just this part.)
  • channel= - Match the string channel= literally.
  • (.*?))" - We're going to match whatever is after the phrase channel= all the way up until it hits a quotation mark ". We put this in parenthesis because we want to capture whatever is in here to use later. We also close out our parenthesis that will opened a couple of steps ago to capture the full url.
  • .*?src=" - Find anything up through src=" and then literally match that phrase.
  • (.*?)" - Capture the value of whatever is following src=" up through the closing quotation marks ".
  • .*? - Match anything after that up through the next tag.
  • </a>\s*</div> - Match a closing "a" tag </a> that can be followed by whitespace characters \s*, followed by a closing "div" tag </div>.

From this, the captured groups will be as follows:

  1. href
  2. channel
  3. src

Here is the REGEX to play around with:

https://regex101.com/r/yX7qZ5/1

And here is a working demo using the expression in a PHP script:

http://ideone.com/YabeHW

Quixrick
  • 3,190
  • 1
  • 14
  • 17