2

This is my first time using re package in python.

In order to understand it better, I decided to copy a poem into my file and play around with re.search() using different regular expressions.

I got the poem from following website and copied it into my text file: http://www.poets.org/poetsorg/poem-day

I have also referred to this, this, this and this in order to help resolve my issue.

Following is my code:

searchFile = open ('/Users/admin/Documents/Python/NLP/Chapter1-TextSample.txt', 'r')

for line in searchFile:
    if re.search('[pP]igeons', line):
        print line

The pigeons ignore us gently as we
scream at one another in the parking
lot of an upscale grocer. 

Pigeons scoot,and finches hop, and cicadas shout and shed
themselves into loose approximations of what
we might have in a different time called heaven.


for line in searchFile:
    if re.search('[pP]igeons', line):
        print line


for line in searchFile:
    print line

As you can see, when I search for the first time, I get correct results. No issues there. However, once I do the same search again or even if I simply try to print the lines of the file, nothing shows up. However, when I check the 'searchFile' object, it still exists as seen below:

In[23]:  searchFile
Out[23]: <open file '/Users/admin/Documents/Python/NLP/Chapter1-TextSample.txt', mode 'r' at 0x103a85d20>

Can someone please highlight why does this happen? Am I missing something?

Community
  • 1
  • 1
user3694373
  • 140
  • 1
  • 9

3 Answers3

3

You've reached the end of the file. You should be able to do this to go back to the beginning:

searchFile.seek(0)
Kevin
  • 28,963
  • 9
  • 62
  • 81
1

Because after the first loop, you've reached the end of the file. Also, you should be using the with() statement to open and automatically be closing the file.

with open('.../Chapter1-TextSample.txt', 'r') as searchFile:
    for line in searchFile:
        if re.search('[pP]igeons', line):
            print line
    searchFile.seek(0)
    # loop again
dursk
  • 4,435
  • 2
  • 19
  • 30
1

Actually, this problem isn't with re, it's about the searchFile.

You're actually consuming the file when you're reading from it, or iterating from it. See:

>>> f = open("test")
>>> f.read()
'qwe\n'
>>> f.read()
''

You can read the file once to a variable, and use it from there, like:

l = searchFile.readlines()

for i in l:
   ...

for i in l:
   ...
utdemir
  • 26,532
  • 10
  • 62
  • 81