How to get this regex working?

Question

i have a small problem, i want to find in

<tr><td>3</td><td>foo</td><td>2</td>

the foo, i use:

$<tr><td>\d</td><td>(.*)</td>$

to find the foo, but it dont work because it dont match with the </td> at the end of foo but with the </td> at the end of the string

Generally speaking, you'll want to use a real html parser, not a regular expression. See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Seth, Sep 01 '10 at 18:28
@Seth, others: Okay, saying the HTML parser thing as a *comment* is the way to go, instead of wasting an answer with that incorrectly. Good show. — Platinum Azure, Sep 01 '10 at 18:37
Also: Who's downvoting the question? It's a good question, well-asked. (+1 to offset) — Platinum Azure, Sep 01 '10 at 18:37
Not an answer to your specific question, but info; with regards to developing/testing RegExs there are some AMAZING tools out there which will parse them in English for you and show you exactly what they're doing along with full-featured building tools. Personally I prefer Expresso ( http://ultrapico.com/Expresso.htm ), but I'm sure there are others. Can help you solve almost any RegEx need. — eidylon, Sep 01 '10 at 18:44

NullUserException · Accepted Answer · 2010-09-01T18:38:12.030

2

You have to make the .* lazy instead of greedy. Read more about lazy vs greedy here.
Your end of string anchors ($) also don't make sense. Try:

<tr><td>\d<\/td><td>(.*?)<\/td>

(As seen on rubular.)

NOTE: I don't advocate using regex to parse HTML. But some times the task at hand is simple enough to be handled by regex, for which a full-blown XML parser is overkill (for example: this question). Knowing to pick the "right tool for the job" is an important skill in programming.

edited Sep 01 '10 at 18:38

answered Sep 01 '10 at 18:26

NullUserException

83,810
28
209
234

I'm just going to say it wasn't me (even though I did downvote another post for saying HTML isn't regular and should not be parsed with regex). You're actually answering the question. (EDIT: +1 for you) – Platinum Azure Sep 01 '10 at 18:34

Senseful · Answer 2 · 2010-09-01T18:32:22.353

0

Use:

^<tr><td>\d</td><td>(.*?)</td>

(insert obligatory comment about not using regex to parse xml)

edited Sep 01 '10 at 18:32

answered Sep 01 '10 at 18:25

Senseful

86,719
67
308
465

score 0 · Answer 3 · answered Sep 01 '10 at 18:26

0

Your leading $ should be a ^.

If you don't want to match all of the way to the end of the string, don't use a $ at the end. However, since * is greedy, it'll grab as much as it can. Some regex implementations have a non-greedy version which would work, but you probably just want to change (.*) to ([^<]*).

answered Sep 01 '10 at 18:26

dash-tom-bang

17,383
5
46
62

Indeed, I'm curious what was wrong enough about this answer to demand a downvote. Alas. – dash-tom-bang Sep 02 '10 at 00:26

How to get this regex working?

3 Answers3