12

I'm writing an assignment to count the number of vowels in a file, currently in my class we have only been using code like this to check for the end of a file:

vowel=0
f=open("filename.txt","r",encoding="utf-8" )
line=f.readline().strip()
while line!="":
    for j in range (len(line)):
        if line[j].isvowel():
            vowel+=1

    line=f.readline().strip()

But this time for our assignment the input file given by our professor is an entire essay, so there are several blank lines throughout the text to separate paragraphs and whatnot, meaning my current code would only count until the first blank line.

Is there any way to check if my file has reached its end other than checking for if the line is blank? Preferably in a similar fashion that I have my code in currently, where it checks for something every single iteration of the while loop

Thanks in advance

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
ay lmao
  • 305
  • 2
  • 5
  • 17
  • 2
    This isn't really worth an answer on its own, but if you skip calling `strip()` on your lines, the rest of your code would work just fine. The call to `readline()` on a line with no text will return `"\n"`, while at the end of the file it will return `""` (an empty string). Another alternative is to call `read()` to get all of the file's text in a single long string which you can then iterate over. You don't actually need to count the vowels line by line. – Blckknght Mar 13 '15 at 00:44

4 Answers4

35

Don't loop through a file this way. Instead use a for loop.

for line in f:
    vowel += sum(ch.isvowel() for ch in line)

In fact your whole program is just:

VOWELS = {'A','E','I','O','U','a','e','i','o','u'}
# I'm assuming this is what isvowel checks, unless you're doing something
# fancy to check if 'y' is a vowel
with open('filename.txt') as f:
    vowel = sum(ch in VOWELS for line in f for ch in line.strip())

That said, if you really want to keep using a while loop for some misguided reason:

while True:
    line = f.readline().strip()
    if line == '':
        # either end of file or just a blank line.....
        # we'll assume EOF, because we don't have a choice with the while loop!
        break
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • @Malonge no, he's saying that the way he was taught to iterate through a file is to look for an empty line, which indicates the end of the file. In THIS file, there are empty lines that aren't the end of the file, so he can't do that. – Adam Smith Mar 12 '15 at 23:46
  • I didn't know there was something like "for line in f", it's working perfectly now with the use of it :) thanks for your help! – ay lmao Mar 12 '15 at 23:50
  • @JoeyZhang `for` loops are massively preferable in Python because SO MANY things are iterable. Not only that but anything you make can BECOME an iterable REALLY EASILY. Use them everywhere. If you're using another kind of loop, strongly consider refactoring so you can use a `for` loop -- treat it like a code smell. – Adam Smith Mar 12 '15 at 23:53
  • 1
    Nitpicking... you should add `as f` to the `with` statement. – mhawke Mar 13 '15 at 00:23
  • 1
    Python doesn't throw EOFError for file.readline() (though I wish it did and many answers claim it does). Instead, look for an unterminated empty line (e.g., `line=f.readline(); if not line: ...`) See also https://docs.python.org/2/library/exceptions.html#exceptions.EOFError – MartyMacGyver Jan 21 '16 at 01:45
  • @AdamSmith Personally I wish there was a consistent EOF test in Python, but it's an old debate apparently. Testing for a blank line without an EOL is as close as you get (but beware prematurely stripping EOL char(s)... otherwise it would falsely trigger on any blank line.) https://mail.python.org/pipermail/python-dev/2001-January/011445.html – MartyMacGyver Jan 21 '16 at 04:38
  • 4
    It would take me a term of my natural life to get 17.8k reputation in SO but there is a correction `if line == '':` and not `if line = ''`. I know that is not remotely connected as part of the discussion but felt like pointing it out. :) :) @AdamSmith – Durwasa Chakraborty Jan 24 '16 at 08:48
  • 1
    @DurwasaChakraborty thanks! Doesn't matter how long you've been coding, you can still typo :) – Adam Smith Jan 24 '16 at 18:03
  • I know it isn't the case in this example, but it seems the best place to ask this question, if I'm iterating with a `while` loop for the benefit of being able to use a flag, is it still more efficient in python to use a `for` loop and go through the entire file or do a while, and when the hit is found, use the flag to break? – Maor Mar 17 '16 at 14:28
  • 1
    @Maor I know this is very old, but I'm assuming you're looking at something like `while not flag: do_stuff()` versus `for thing in stuff: do_stuff(); if flag: break`. I would GENERALLY prefer a `for` loop there, but it strongly depends on individual circumstance. You're not going to see much of a performance difference, so prefer instead to make your code more readable. – Adam Smith Sep 25 '16 at 18:53
  • wait til blank is ok. according to the doc, "When size is not 0, an empty string is returned only when EOF is encountered immediately." While loop is not misguided. It depends on usage. For example, the loop may be from 1 to n where n depends on the data in the file. – thang May 28 '17 at 18:56
1

Find end position of file:

f = open("file.txt","r")
f.seek(0,2) #Jumps to the end
f.tell()    #Give you the end location (characters from start)
f.seek(0)   #Jump to the beginning of the file again

Then you can to:

if line == '' and f.tell() == endLocation:
   break
Punnerud
  • 7,195
  • 2
  • 54
  • 44
0
import io

f = io.open('testfile.txt', 'r')
line = f.readline()
while line != '':
        print line
        line = f.readline()
f.close()
-3

I discovered while following the above suggestions that for line in f: does not work for a pandas dataframe (not that anyone said it would) because the end of file in a dataframe is the last column, not the last row. for example if you have a data frame with 3 fields (columns) and 9 records (rows), the for loop will stop after the 3rd iteration, not after the 9th iteration. Teresa

  • 3
    Because pandas and dataframes within pandas were not mentioned in this question, I would advise this to be made into a comment on the original post rather than an answer. (But, this is a little late to the post so maybe we just leave it alone.) – BlackVegetable Apr 12 '17 at 21:09