2

I am learning to code in Python. Now I am experimenting with a file comparison program from here.

My code is:

#!/usr/bin/python3

def main():
    fhand1 = open('mbox.txt')
    print('file handle for mbox is {}'.format(fhand1))
    count = 0
    for l1 in fhand1:
        count = count + 1
        l1 = l1.rstrip()  # Skip 'uninteresting lines'
        if l1.startswith('From:'):
            print('{}'.format(l1))
    print('Numer of lines: {}'.format(count))

    fhand2 = open('mbox-short.txt')
    #inp = fhand2.read(), when here for loop does not work
    #for l2 in fhand2:
        #if l2.startswith('From:'):
            #print('{}'.format(l2))
    inp = fhand2.read()#if for loop is active then this doesnot work
    print('Total characters in mbox-short: {}'.format(len(inp)))
    print('First 20 characters on mbox-short: {}'.format(inp[:56]))

if __name__ == "__main__": main()

My question is for 'mbox-short.txt'. When I put inp = fhand2.read() before the for l2 in fhand2: {} the for loop does not run. When I change the sequence, the read() operation does not work.

Can someone please explain this?

Btw, I am using JetBrains PyCharm Community Ed 4 IDE.

Thank you in advance.

algoProg
  • 718
  • 2
  • 11
  • 27

4 Answers4

1

What is happening here is the read operation returning the full contents of the file (thus placing the caret at the end of the file) by the time when you assign your variable, that is why you are receiving empty string.

You need either do this:

fhand2 = open('mbox-short.txt')
inp = fhand2.read() # uncomment the first read operation
for l2 in fhand2:
    if l2.startswith('From:'):
        print('{}'.format(l2))
# inp = fhand2.read() comment out the second one

or this:

fhand2 = open('mbox-short.txt')
inp = fhand2.read()
for l2 in fhand2:
    if l2.startswith('From:'):
        print('{}'.format(l2))
fhand2 = open('mbox-short.txt') # re-open the file you have already read
inp = fhand2.read()

See more information on the python i/o here.

Yevgen
  • 1,576
  • 1
  • 15
  • 17
0

The read() method will read the full file into a string. So if say your file looks like

1 2 3 4
5 6 7 8

This will return "1 2 3 4\n5 6 7 8\n". So when you say, for l2 in fhand2, it will loop across this string. Thus you are basically going through each and every element in the string. i.e 1, , 2 and so on.

If you want to read line by line, either use readline() which will fetch you the next line, or use readlines() which will fetch you a list like - ["1 2 3 4\n", "5 6 7 8\n"]

hyades
  • 3,110
  • 1
  • 17
  • 36
0

inp = fhand2.readlines() should fix your problem. FYI check this out How do I read a file line-by-line into a list?

Community
  • 1
  • 1
santhosh
  • 177
  • 3
  • 4
0

By calling .read() on a file object you empty it and therefore cant loop over its elements anymore. You can test this by calling read with the optional [size] argument. The size of mbox-short.txt is 94626. Calling read with 94625 reads the first 94625 bytes of your file into a string. You can than loop over the remaining 1 byte in the file object (which is the newline character \n). file.read([size]) reads the whole file into a string by default and therefore nothing to iterate over remains.

  filehandle = open("mbox-short.txt")
  fileString = filehandle.read(94625)
  print (len(fileString))
  count = 0
  for x in filehandle:
      print (x)
      count += 1
  print (count)

See: https://docs.python.org/2/library/stdtypes.html?highlight=read#file.read

(I can't find file.read() in python3 documentation, but I assume it hasn't changed over the versions)

Osvald Laurits
  • 1,228
  • 2
  • 17
  • 32
  • Thank you! Cannot upvote it because of the low rep, but it was helpful in understanding the concept. – algoProg Mar 02 '15 at 10:29