0

Here is an example of the string I'm matching against.

<div class="unique"><a href="/2343242/link to something target="_self">"HERE IS THE TEXT 
I'D LIKE"</a></div>

This gets me way too much. As in it doesnt stop at the and it seems to still return the div and a tags.

/(?:<div class="unique">)?(?:<a href=.*>)?.*(?:<)?/

This returns nothing

/(?:<div class="unique">)?(?:<a href=.*>)?.*(?:</a>)?/

So shouldn't the first part match against the unique "div" tag and the following "a" tag and not return them. Then grab everything up until the first < that it hits which would be the closing "a" tag? I'm lost as to what is mucking this up.

Thank you.

TheEditor
  • 486
  • 1
  • 4
  • 18
  • Is your input guaranteed to _never_ have any nested `div`-s or other elements? If not, regex will be very, very hard (if not downright impossible) to use for this. HTML is not a regular language. – xxbbcc Feb 02 '13 at 19:31
  • Whatever language you're using, it **has** a library for HTML parsing and manipulation. Use it and save yourself lots of pain and bitter disappointment. – Martin Green Feb 02 '13 at 19:34
  • There will never be any nested anything. The html will never change the form it is in above. Already been down the html_simple_dom road and this is much easier in the long run. Just trying to figure out the mistake in my statement. – TheEditor Feb 02 '13 at 19:42
  • why not using simpler pattern like [This](http://phpfiddle.org/main/code/grv-nyx) –  Feb 02 '13 at 20:27

1 Answers1

0

Seems this works better.

(?:<div class="unique">)(?:<a href=.*?>)?.*?(?:<.a>)

Works great.

TheEditor
  • 486
  • 1
  • 4
  • 18