Extract text between first tag

Question

I have a string

$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor <a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> dolore magna aliqua. Ut enim ad     minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in <a href="http://example.com">Do not want this text</a> reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

How can I extract the text between the first instance of an tag that links to http://example.com? I don't want the text that links to http://example2.com or the text in the second link that links to http://example.com.

I want to return 'Want this text'. Any idea how to do this?

Thanks!

Possible duplicate of [Regex PHP, Match all links with specific text](http://stackoverflow.com/questions/1661179/regex-php-match-all-links-with-specific-text) — yivi, Jan 05 '17 at 15:25

score 2 · Answer 1 · answered Jan 05 '17 at 15:29

You can most likely achieve your goal using DOMDocument - in conjunction with DOMXPath for more complicated requirements.

$dom=new DOMDocument;
$dom->loadHTML( $str );

$col=$dom->getElementsByTagName('a');
if( !empty( $col ) ){
    foreach( $col as $node )echo $node->nodeValue;
}

shalvah · Answer 2 · 2017-01-05T15:39:28.813

You'll need to use DomDocument. DomDocument allows you to use PHP to interact with a HTML page via the Document Object Model.

$dom = new DomDocument;
$dom->loadHTML(file_get_contents($url));
$dom->preserveWhiteSpace = false; //remove unnecessary whitespace
$links = $dom->getElementsByTagName('a');

At this point, you have an array of objects. Each object is, in essence, an ElementNode with tag a.

Assuming you want to retrieve the text of the first link, you'd then do: $text = $links[0]->nodeValue;

However, if you instead want the text that matches the link "http://example.com", then you could do:

foreach ($links as $link)
{
  if($link->attributes->href == "http://example.com") {
  $text = $link->nodeValue;
}

score 0 · Answer 3 · answered Jan 05 '17 at 15:23

You can do this with a regex, for example:

\<a href=\"http:\/\/example.com\".*\>(.*?)\<\/a\>

Code snippet:

$str = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor <a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> dolore magna aliqua. Ut enim ad     minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in <a href="http://example.com">Do not want this text</a> reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

$regex = '/\<a href=\"http:\/\/example.com\".*\>(.*?)\<\/a\>/g';
preg_match($regex, $str, $matches);

In $matches you'll find the output you want.

[you shouldn't use regex to parse HTML](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) — mister martin, Jan 05 '17 at 15:25
@mistermartin way faster and less bugy than DomDocument... As long as you don't need to parse an entire file, regex is better. — Blaatpraat, Jan 05 '17 at 15:39

score -1 · Answer 4 · answered Jan 05 '17 at 15:21

-1

Use preg_match()

Example:

$string = '<a href="http://example2.com">Do not want this text</a> incididunt ut labore et <a href="http://example.com">Want this text</a> '; 

if ( preg_match('/<\s*a[^<>]*>([^<>]+)</a>/i', $string, $matches) ) {
       var_dump($matches); 
}

answered Jan 05 '17 at 15:21

malutki5200

1,092
7
15

3

[you shouldn't use regex to parse HTML](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) – mister martin Jan 05 '17 at 15:23
And why is that ? – malutki5200 Jan 05 '17 at 15:24
@malutki5200 in case you didn't notice [the link](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) *mister martin* used in the comment, you should read the answers (and comments) to [the question](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) – Sᴀᴍ Onᴇᴌᴀ Feb 28 '17 at 00:33

Extract text between first tag

4 Answers4