I am trying to create a regex that matches the link from a page source. I have text formatted like this:
something here here's a link
<a class="_5syj" href="https://www.here.com/FirstCal?ref=br_rs">First Cal</a><span class="mls _1ccm9 _49"></span><a class="_fasc" href="https://www.here.com/Mall?ref=br_rs">Mall</a><span class="m1ls _1cm9 _49"></span>
I want to get all the links that start with href="https://www.here.com/(.*)?ref=br_rs">
So from the links about, I would get either the entire link, or FIrstCal and Mall (from the link)
Python code:
regex = r'(?<=href="https://www.here.com/).*(?<=?ref=br_rs)'
link = re.findall(regex, str(source))
link
But it's not working.
Any ideas ?
PS: Regex would be the only way to do this. A html parse won't work because the website is not "stable" with it's structure.