Regex too much or none at all.

Question

Here is an example of the string I'm matching against.

<div class="unique"><a href="/2343242/link to something target="_self">"HERE IS THE TEXT 
I'D LIKE"</a></div>

This gets me way too much. As in it doesnt stop at the and it seems to still return the div and a tags.

/(?:<div class="unique">)?(?:<a href=.*>)?.*(?:<)?/

This returns nothing

/(?:<div class="unique">)?(?:<a href=.*>)?.*(?:</a>)?/

So shouldn't the first part match against the unique "div" tag and the following "a" tag and not return them. Then grab everything up until the first < that it hits which would be the closing "a" tag? I'm lost as to what is mucking this up.

Thank you.

Is your input guaranteed to _never_ have any nested `div`-s or other elements? If not, regex will be very, very hard (if not downright impossible) to use for this. HTML is not a regular language. — xxbbcc, Feb 02 '13 at 19:31
Whatever language you're using, it **has** a library for HTML parsing and manipulation. Use it and save yourself lots of pain and bitter disappointment. — Martin Green, Feb 02 '13 at 19:34
There will never be any nested anything. The html will never change the form it is in above. Already been down the html_simple_dom road and this is much easier in the long run. Just trying to figure out the mistake in my statement. — TheEditor, Feb 02 '13 at 19:42
why not using simpler pattern like [This](http://phpfiddle.org/main/code/grv-nyx) — , Feb 02 '13 at 20:27

score 0 · Accepted Answer · answered Feb 02 '13 at 19:56

0

Seems this works better.

(?:<div class="unique">)(?:<a href=.*?>)?.*?(?:<.a>)

Works great.

answered Feb 02 '13 at 19:56

TheEditor

486
1
4
18

Can be shortened to `/
(?:)?.*?<\/a>/`
– nhahtdh Feb 02 '13 at 21:41

Regex too much or none at all.

1 Answers1