0

please help me to resolve this. I stuck on this code

My contents:

<a href="/path/1232432432">Get Me</a>
<a href="/path/7845454354"><img src="imagelink.png" /></a>
<a href="#">Other link</a>

I want to get innertext "Get Me"

My regex: /(?<=\/path\/)(?!.*img).*?(?=<\/a>)/g

My results:

1232432432">Get Me

I need exception for getting item like remove digit value after '/path/...'

Any help will be appreciated, thanks..

Limon Monte
  • 52,539
  • 45
  • 182
  • 213
  • 2
    Try to avoid parse HTML with regular expressions, because http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – IvanGL Jan 13 '15 at 09:40

4 Answers4

2

Use \K to discard the previously matched characters from printing at the final. \K keeps the text matched so far out of the overall regex match.

\/path\/\d+">\K(?!.*img).*?(?=<\/a>)

DEMO

$re = "~\/path\/\d+\">\K(?!.*img).*?(?=<\/a>)~m";
$str = "<a href=\"/path/1232432432\">Get Me</a>\n<a href=\"/path/7845454354\"><img src=\"imagelink.png\" /></a>\n<a href=\"#\">Other link</a>";
preg_match_all($re, $str, $matches);
print_r($matches);
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • And today I've learned something new! But remember that it is HTML, and this method is error prone. The 'best' way to deal with all possibilities is using phpQuery or another DOM parsing tool. But your regex does the job! – Ismael Miguel Jan 13 '15 at 09:41
  • 1
    `\K` is not an anchor ... FYI – hwnd Jan 13 '15 at 10:17
1
(?<=\/path\/)(?!.*img)[^"]+">(.*?)(?=<\/a>)

You can use groups to capture what you want using your own regex.See demo.

https://regex101.com/r/sH8aR8/62

$re = "/(?<=\\/path\\/)(?!.*img)[^\"]+\">(.*?)(?=<\\/a>)/m";
$str = "<a href=\"/path/1232432432\">Get Me</a>\n<a href=\"/path/7845454354\"><img src=\"imagelink.png\" /></a>\n<a href=\"#\">Other link</a>";

preg_match_all($re, $str, $matches);
vks
  • 67,027
  • 10
  • 91
  • 124
0

Maybe...?

/(?<=\/path\/1232432432">)(?!.*img).*?(?=<\/a>)/g

Joe
  • 1
  • 1
0

Use this one:

/<a\s+(.*?)>(.*?)<\/a>/img
RLX
  • 86
  • 7