0

I am trying to read the contents of a file and check if matches list of patterns using regular expression.

File content:

google.com
https://google.com
yahoo.com
www.yahoo.com
yahoo

My code:

import re
file = 'data_files/test_files/content.txt'

regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")

data = open(file, 'r')

print ("Checking Regex 1")
if regex_1.match(data.read()):
    count_c = len(regex_1.findall(data.read()))
    print ("Matched Regex 1 - " + str(count_c))
print("Checking Regex 2")

if regex_2.match(data.read()):
    count_d = len(regex_2.findall(data.read()))
    print("Matched Regex 2 -  " + str(count_d))
else:
    print ("No match found")

Output:

Checking Regex 1
Checking Regex 2
No match found

Couldn't figure out what is wrong here.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
Karthik
  • 363
  • 2
  • 7
  • 20
  • @jonrsharpe The problem here is that he's calling `data.read()` repeatedly, and only the first one reads anything because he doesn't rewind. What does that duplicate have to do with it? – Barmar Apr 06 '19 at 15:43
  • @Barmar their other problem is that `match` only matches at the *start* of the string anyway. But that's only in the second answer, so maybe not the best target. – jonrsharpe Apr 06 '19 at 15:44
  • 1
    @jonrsharpe Here's the dupe I use for that issue: https://stackoverflow.com/questions/180986/what-is-the-difference-between-re-search-and-re-match – Barmar Apr 06 '19 at 15:48

1 Answers1

1

Every time you call data.read(), it starts reading from the place in the file where the last call finished. Since the first call reads the entire file (because you didn't specify a limit), all the remaining calls start reading from the end of the file, so they don't read anything.

You should read the file into a variable, and then use that instead of calling data.read() repeatedly.

You also need to use re.search(), not re.match(). See What is the difference between re.search and re.match?

import re
file = 'data_files/test_files/content.txt'

regex_1 = re.compile("google")
regex_2 = re.compile("yahoo")

with open(file, 'r') as data:

print ("Checking Regex 1")
if regex_1.search(contents):
    count_c = len(regex_1.findall(contents))
    print ("Matched Regex 1 - " + str(count_c))

print("Checking Regex 2")
if regex_2.search(contents):
    count_d = len(regex_2.findall(contents))
    print("Matched Regex 2 -  " + str(count_d))
else:
    print ("No match found")
Barmar
  • 741,623
  • 53
  • 500
  • 612