0

I have below example. I want to extract url from below using xpath. The url keeps changing but the text "Url" and div class is same throughout. Is it possible to extract url from just the text "Url" and the text Url is outside <a>

<div class="Dataset">
"data1 : value1"
<br>
"data2: value2"
<br>
    "Url :"
    <a href="http://somechangingurl.com"/>
<br>
"data3: value3"
<br>
"data4: value4"
</div>
Jeeva
  • 632
  • 1
  • 12
  • 21

2 Answers2

0

Although I think that xpath is a good way to go to get the URL, since you want the preceding text, I would go for a regex:

$re = '/"(.+)"\s+<a href="(.+)"/';
$str = '<div class="Dataset">
"data1 : value1"
<br>
"data2: value2"
<br>
    "Url :"
    <a href="http://somechangingurl.com"/>
<br>
"data3: value3"
<br>
"data4: value4"
</div>';

preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE, 0);

// Print the entire match result
var_dump($matches);
Gustavo Jantsch
  • 382
  • 2
  • 9
0

I solved it myself. Below is what i did.

//div[@class="Dataset"]/text()[contains(.,'Url :')]/following-sibling::a/@href
Jeeva
  • 632
  • 1
  • 12
  • 21