3

i have a small problem, i want to find in

<tr><td>3</td><td>foo</td><td>2</td>

the foo, i use:

$<tr><td>\d</td><td>(.*)</td>$

to find the foo, but it dont work because it dont match with the </td> at the end of foo but with the </td> at the end of the string

gurehbgui
  • 14,236
  • 32
  • 106
  • 178
  • 5
    Generally speaking, you'll want to use a real html parser, not a regular expression. See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Seth Sep 01 '10 at 18:28
  • @Seth, others: Okay, saying the HTML parser thing as a *comment* is the way to go, instead of wasting an answer with that incorrectly. Good show. – Platinum Azure Sep 01 '10 at 18:37
  • 1
    Also: Who's downvoting the question? It's a good question, well-asked. (+1 to offset) – Platinum Azure Sep 01 '10 at 18:37
  • Not an answer to your specific question, but info; with regards to developing/testing RegExs there are some AMAZING tools out there which will parse them in English for you and show you exactly what they're doing along with full-featured building tools. Personally I prefer Expresso ( http://ultrapico.com/Expresso.htm ), but I'm sure there are others. Can help you solve almost any RegEx need. – eidylon Sep 01 '10 at 18:44

3 Answers3

2

You have to make the .* lazy instead of greedy. Read more about lazy vs greedy here.
Your end of string anchors ($) also don't make sense. Try:

<tr><td>\d<\/td><td>(.*?)<\/td>

(As seen on rubular.)

NOTE: I don't advocate using regex to parse HTML. But some times the task at hand is simple enough to be handled by regex, for which a full-blown XML parser is overkill (for example: this question). Knowing to pick the "right tool for the job" is an important skill in programming.

NullUserException
  • 83,810
  • 28
  • 209
  • 234
  • I'm just going to say it wasn't me (even though I did downvote another post for saying HTML isn't regular and should not be parsed with regex). You're actually answering the question. (EDIT: +1 for you) – Platinum Azure Sep 01 '10 at 18:34
0

Use:

^<tr><td>\d</td><td>(.*?)</td>

(insert obligatory comment about not using regex to parse xml)

Senseful
  • 86,719
  • 67
  • 308
  • 465
0

Your leading $ should be a ^.

If you don't want to match all of the way to the end of the string, don't use a $ at the end. However, since * is greedy, it'll grab as much as it can. Some regex implementations have a non-greedy version which would work, but you probably just want to change (.*) to ([^<]*).

dash-tom-bang
  • 17,383
  • 5
  • 46
  • 62