I'm trying to use regex to match a cell in a table, but the problem is not all cells follow the same pattern. For example, the td may take this format:
<td><a href="page101010.html">PageNumber</a></td>
or this format:
<td align="left" ></td>
Basically, the hyperlink part within the td is not present in all, its just in some.
I tried matching this situation using the below python regex code, but its failing.
match = re.search(r'<td align="left" ><?a?.+\>?(.+)\<?\/?a?\>?\<\/td\>', tdlink)
I just need 'match' to find the part enclosed in () above. However I'm getting syntax error or a None Object message.
Where am I going wrong?