1

So I ran into a problem with a scraping bot which I am making currently. There is a task in this bot which basically checks all the available A tags in an HTML page and checks if the target URL specified by a user is present in this HTML page or not.

Here's the code snippet.

for anchor in soup.findAll('a', href=True):
  if target in anchor['href']:
    backlink_update()
    return 0

Now currently this is what works Target URL: domain.com A web page which consists of an A tag with href URL as domain.com or domain.com/dsadsd/dsds

However, this function will not work if the target url is domain.com and the URL on the web page is Domain.com or DoMaIn.com

So basically I need to be able to match any of the target variation in a particular web page. The variation can be either domain.com, Domain.com or even DomAin.com

I tried the following Regex pattern but it didn't seem to work.

\s?[^a-zA-Z0-9\_](?i)yaconsyrupstory(?-i)[^a-zA-Z0-9\_]
Gaurav Jagtap
  • 151
  • 1
  • 1
  • 12
  • Sounds like you need to check if a string contains another string in a case insensitive way. See [How to make string check case insensitive in Python 3.2](http://stackoverflow.com/a/5889970/3832970). – Wiktor Stribiżew May 11 '17 at 13:15

0 Answers0