The pattern is as follows
page_pattern = 'manual-data-link" href="(.*?)"'
The matching function is as follows, where pattern
is one of the predefined patterns like the above page_pattern
def get_pattern(pattern, string, group_num=1):
escaped_pattern = re.escape(pattern)
match = re.match(re.compile(escaped_pattern), string)
if match:
return match.group(group_num)
else:
return None
The problem is that match is always None, even though I made sure it works correctly with http://pythex.org/. I suspect I'm not compiling/escaping the pattern correctly.
Test string
<a class="rarity-5 set-102 manual-data-link" href="/data/123421" data-id="20886" data-type-id="295636317" >Data</a>