1

I have the following HTML:

<tr>
<td style="text-align: left;" colspan="1">10:10</td>
<td style="text-align: left;" colspan="1">This is a description.</td>
</tr>
<tr>
<td colspan="1">10:30</td>
<td colspan="1">This is another description.</td>
</tr>

I'm wanting to return multiple matches, each consisting of two groups: group 1 which is the timestamp, and group 2 which is the description.

When I run

re.findall(r'<td.*>(\d\d:\d\d)<\/td><td.*>(.*?)<\/td>', HTML)

I'm only getting the last match:

[('10:30', 'This is another description.')]

Can anyone tell me what's wrong with my regex?

scuzzi
  • 23
  • 3

1 Answers1

2

Your first .* is matching as many characters as it can, so you get exactly one match that's everything from the first <td to the last </td>. Using [^>]* instead of .* for the first two will make it only match what's inside one tag.

The Zach Man
  • 738
  • 5
  • 15