4

So I was trying to match each line of a file to a regex and I did the following:

import re
regex='\S+\s+(\S{6})\s+VAR'
with open('/home/jyt109/humsavar.txt') as humsavar:
    for line in humsavar:
        match=regex.search(line)
        print match.group(1)

Expected output is the particular 6 characters that are in each line, instead I get an error as below:

Traceback (most recent call last):
  File "exercise.py", line 74, in <module>
    match=regex.search(line)
AttributeError: 'str' object has no attribute 'search'

I have found out (from link below) that to match a regex to each line of a file, the file has to be first turned into a list by file.read()

Match multiline regex in file object

To readdress the post, is there any simpler way to do it (preferably over 1 line instead of 2)?

humsavar=open('/home/jyt109/humsavar.txt')
text=humsavar.read()

Thanks!

Community
  • 1
  • 1
noqa
  • 313
  • 2
  • 4
  • 11

3 Answers3

6

I think you may have misunderstood what that link was saying. If matches of your regex can span multiple lines, then you need to read the file using file.read(). If newlines will never be a part of a match, then you can read the file line by line and try to match each line separately.

If you want to check each line separately, you can use file.readlines() to get a list of lines or just iterate over the file object, for example:

with open('/home/jyt109/humsavar.txt') as f:
    for line in f:
        match = regex.search(line)

Assuming you do still want to read the entire file contents at once, you do that on one line like this:

text = open('/home/jyt109/humsavar.txt').read()
Andrew Clark
  • 202,379
  • 35
  • 273
  • 306
3

.read() does not turn a file into a list (.readlines() does); instead it puts the entire file into a string.

But even then you can use a regex: when compiling it with re.MULTILINE, the anchors ^ and $ will match the starts and ends of individual lines:

>>> regex = re.compile(r"^Match this regex in each line$", re.MULTILINE)
>>> regex.findall(text)

The result will be a list of all matches.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
3

Here is a simple one-liner. I tested it on the below data file. When using regular expressions it is convenient to use the Raw String Notation as I have done below. I don't know what your data file is meant to look like but I just made one up that would match the search pattern you specified.

code

import re
print re.findall(r'\S+\s+(\S{6})\s+VAR', open('/tmp/test.txt').read())

output

['000001', '000002', '123456']

test.txt

x 000001 VAR
x 000002 VAR
x 123456 VAR
Marwan Alsabbagh
  • 25,364
  • 9
  • 55
  • 65