re.findall only returning the last match

Question

I have the following HTML:

<tr>
<td style="text-align: left;" colspan="1">10:10</td>
<td style="text-align: left;" colspan="1">This is a description.</td>
</tr>
<tr>
<td colspan="1">10:30</td>
<td colspan="1">This is another description.</td>
</tr>

I'm wanting to return multiple matches, each consisting of two groups: group 1 which is the timestamp, and group 2 which is the description.

When I run

re.findall(r'<td.*>(\d\d:\d\d)<\/td><td.*>(.*?)<\/td>', HTML)

I'm only getting the last match:

[('10:30', 'This is another description.')]

Can anyone tell me what's wrong with my regex?

try with [^>]* instead of .*, you may be eating too many chars, esp. on the first .* — B. Go, Feb 08 '20 at 23:57

score 2 · Accepted Answer · answered Feb 09 '20 at 00:00

2

Your first .* is matching as many characters as it can, so you get exactly one match that's everything from the first <td to the last </td>. Using [^>]* instead of .* for the first two will make it only match what's inside one tag.

answered Feb 09 '20 at 00:00

The Zach Man

738
5
15

re.findall only returning the last match

1 Answers1