4

If my HTML is:

<tr><td>....</td><hr></tr>
<tr><td>....</td><hr></tr>
<tr><td>....</td><hr></tr>
<tr><td>....</td><hr></tr>
<tr><td>....</td><hr></tr>
<tr><td>....</td><hr></tr>

If my regex is:

Patterp p = Pattern.compile("<tr>(.*)<hr></tr>");

Should this get 1 result or all the individual rows?

Is there a way to force it to get all the rows and not just the entire html from the top <tr> to the last instance of <hr></tr> ?

Blankman
  • 259,732
  • 324
  • 769
  • 1,199

1 Answers1

11

Your regex is using .* which is greedy. Try using .*? instead. A greedy match will grab as much as it can before matching following tokens, so it will go find the last <hr> in your source text. A non-greedy match will grab as little as it can before matching the next token(s).

Then, see this answer for more information about parsing HTML with regular expressions.

Community
  • 1
  • 1
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285