i m using regex in python to extract data from html. the regex that i ve written is like this:
result = re.findall(r'<td align="left" csk="(\d\d\d\d)\d\d\d\d"><a href=.?*>(.*?)</a></td>\s+|<td align="lef(.*?)" >(.*?)</td>\s+', webpage)
assuming that this will the td which follows either of the format -
<td align="left" csk="(\d\d\d\d)\d\d\d\d"><a href=.?*>(.*?)</a></td>\s+
OR
<td align="lef(.*?)" >(.*?)</td>
this is because the td can take different format in that particular cell (either have data with a link, or even just have no data at all).
I assume that the OR condition that i ve used is incorrect - believe that the OR is matching only the "just" preceding regex and the "just" following regex, and not between the two entire td tags.
my question is, how do i group it (for example with paranthesis), so that the OR is matched between the entire td tags.