How to get the nearest closing tag

Question

I have a string like:

<tr><td>abc</td><td style="any" class="marked">dfg</td><td>hij</td></tr>

and I try to get marked td tag by such a regexp:

/<td.*class="marked.*<\/td>/si

but getting this:

<td>abc</td><td style="any" class="marked">dfg</td><td>hij</td>

How should I change my regexp to get such a string?

<td style="any" class="marked">dfg</td>

You should see [this answer](http://stackoverflow.com/a/1732454/1864610) — , Jan 20 '14 at 16:26

score 1 · Answer 1 · edited May 23 '17 at 10:25

1

.* is greedy and will match as much as possible.

.*? is lazy and will match as little as possible.

tl;dr: use .*? instead.

That said, regex is not an HTML parser, but we've been through this many times before

edited May 23 '17 at 10:25

Community

answered Jan 20 '14 at 16:26

h2ooooooo

score 1 · Accepted Answer · answered Jan 20 '14 at 16:26

1

You have two issues:

Your expression doesn't guarantee that class="marked" is associated with the same tag as the <td at the start of the expression.
The .*<\/td> at the end is greedy and will match all the way to the last closing </td>.

This pattern will address both these issues:

/<td[^>]+class="marked">.*?<\/td>/si

answered Jan 20 '14 at 16:26

jmar777

As mentioned by @h2ooooooo in his answer, parsing HTML with regex is generally a bad idea. It's only acceptable when you can make certain guarantees about the markup. For example, this pattern will fail with `class="foo marked"`, or for a host of other reasons if the HTML isn't very similar to what was in your example. – jmar777 Jan 20 '14 at 16:30
jmar777: yes, I know that regexp not very good for html tasks. Thanks for the answer – Vito Jan 20 '14 at 16:49

2 Answers2