0

I am new to Pytho. I am using it to read a large file. To do this, I am using the file object as iterator as specified in the sixth answer here by jyoti das: How can I read large text files in Python, line by line, without loading it into memory?

My code:

with open(filename, 'r', buffering=100000) as f:
    time_data_count = 0
    for line in f:
        if 'TIME_DATA' in f:
            time_data_count += 1
    if time_data_count > 20:
        print("time_data complete")
    else:
        print("incomplete time_data data")

However, my code only reads the first line of the file and then exists the loop so time_data_count stays at 0. Why is this?

I have tried stepping into the code but I don't see why it only stops at first line

edo101
  • 629
  • 6
  • 17

1 Answers1

3

You tested if 'TIME_DATA' in f:, which consumes the whole file looking for the string (which it won't find unless the last line is just that string, and isn't newline terminated). That means the file iterator is exhausted when the for loop tries to go to the next line.

You meant to test if 'TIME_DATA' in line:.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • But if it searches through the whole file looking for that string, why doesn't it find it then? It should find it no? – edo101 Jun 01 '20 at 19:51
  • But if it searches through the whole file looking for that string, why doesn't it find it then? It should find it no? @ShadowRanger – edo101 Jun 01 '20 at 20:03
  • 2
    @edo101: It's searching for a *line* that is *exactly* equal to the string. It's like testing `if 'A' in ['A\n', 'B\n', 'C\n']:`, not testing `if 'A' in 'A\nB\nC\n':`. So it won't find substrings, only complete lines. And since all but the last line definitionally ends with a newline, only the last line could possibly match. – ShadowRanger Jun 01 '20 at 21:31
  • I just thought of something with the "breaking the file into byte chunks" While I like this method, you run the risk of having line in your text broken into chunks. I saw this personally, which means that if you are searching for string in the file like I was, I'd miss some because the line they were at were broken into chunks. This hasn't happend to me yet but it seems very likely it will happen at some point. Is there a way to get around this? Using readlines didn't work well as i got miscounts @ShadowRanger. – edo101 Jun 03 '20 at 02:40
  • @edo101: Huh? We're not breaking anything into byte chunks; iterating by line always returns complete lines. I didn't suggest using `readlines` (in fact, I specifically recommended *not* doing so; the other answer has been wrong in multiple ways), because it's not needed. I have no idea what you think is going to go wrong, but it's not going to happen; if you read a single file, line by line, as in your code, but with the `if` check fixed, you will not skip lines, you will not receive incomplete lines, it will just work. – ShadowRanger Jun 03 '20 at 11:11