Print specific lines of multiple files in Python

Question

I have 30 text files of 30 lines each. For some reason, I need to write a script that opens file 1, prints line 1 of file 1, closes it, opens file 2, prints line 2 of file 2, closes it, and so on. I tried this:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    f.close()
            continue

Obviously, I got the following error:

ValueError: I/O operation on closed file.

Because of the f.close() thing. How can I do to move from a file to the next one after reading only the desired line?

You can use `break` to exit a loop; replace `f.close()` with that. The `continue` at the bottom is also unnecessary, and the outer loop can be a `for i in range(0, 30):` (or `i, file in enumerate(files)`?) without explicitly incrementing `i`. — Ry-, Feb 15 '17 at 02:40
Note following up on @Ryan: The `f.close()` isn't needed at all because you (correctly) used the `with` statement when `open`ing the file, ensuring that it is automatically closed when you exit the block. — ShadowRanger, Feb 15 '17 at 02:46
Side-note: You could remove the explicit inner loop entirely using `itertools.islice`. Replace the whole contents of the `with` block with `print(next(itertools.islice(f, i, None)))`, no need for explicit looping of any kind. This requires @Ryan's suggested change of replacing the outer `while` loop with a `for i, file in enumerate(files):` (or to ensure you only process 30 files, `for i, file in enumerate(islice(files, 30)):`) so you're not manually tracking/incrementing `i`. — ShadowRanger, Feb 15 '17 at 02:52

ShadowRanger · Answer 1 · 2017-02-15T03:11:06.423

First off, to answer the question, as noted in the comments, your main problem is that you close the file then try to continue iterating it. The guilty code:

        for index, line in enumerate(f): # <-- Reads
            if index == i:
                print(line)
                i += 1
                f.close()                # <-- Closes when you get a hit
                                         # But loop is not terminated, so you'll loop again

The simplest fix is to just break instead of explicitly closing, since your with statement already guarantees deterministic closing when the block is exited:

        for index, line in enumerate(f):
            if index == i:
                print(line)
                i += 1
                break

But because this was fun, here's a significantly cleaned up bit of code to accomplish the same task:

import glob
from itertools import islice

# May as well use iglob since we'll stop processing at 30 files anyway    
files = glob.iglob('/Users/path/to/*/files.txt')

# Stop after no more than 30 files, use enumerate to track file num
for i, file in enumerate(islice(files, 30)):
    with open(file,'r') as f:
        # Skip the first i lines of the file, then print the next line
        print(next(islice(f, i, None)))

zwer · Answer 2 · 2017-02-15T03:20:47.943

2

You can use the linecache module to get the line you need and save yourself a lot of headache:

import glob
import linecache

line = 1
for file in glob.glob('/Users/path/to/*/files.txt'):
    print(linecache.getline(file, line))
    line += 1
    if line > 30:  # if you really need to limit it to only 30
        break

edited Feb 15 '17 at 03:20

answered Feb 15 '17 at 02:49

zwer

24,943
3
48
66

1

Good suggestion, though I will note that `linecache` caches the whole file into memory to get a single line; this is usually not a problem for smallish files (e.g. the source files the module was originally designed for), particularly if you need to perform random access for multiple lines, but for arbitrary inputs, you can end up reading a GB file into memory (where the lines require far more than a GB of memory thanks to Python overhead) even if all you want is the first line of the file. It would also make sense to avoid manual `line` tracking, and just wrap the `glob` call in `enumerate`. – ShadowRanger Feb 15 '17 at 02:56
True, while very convenient `linecache` can eat up memory but I didn't get the notion that OP will have large files to deal with. One can always call `clearcache()` after dealing with it if access to the files is no longer required. And if access to really huge files is required, going through them line by line (the traditional way) would probably have horrible performance either - if that was the requirement I'd rather suggest using the `mmap` module and let the OS optimize access to the data. – zwer Feb 15 '17 at 03:16
Thanks! That worked perfectly, although I had to replace `line 0` by `line 1`. – partialcorrelations Feb 15 '17 at 03:18
Ooops, forgot that `linecache` line index starts with 1. Fixed. – zwer Feb 15 '17 at 03:20
@zwer: Going through them line by line until you reach the target line would be fine if you're only accessing the first 30 lines or fewer; doesn't matter how large the file itself is, the time to read the first 30 lines is tied to the size of the first 30 lines, not the size of the file. There are [line oriented uses for `mmap`](http://stackoverflow.com/a/34029605/364696), but it wouldn't help much here; you'd still need to scan for line breaks. You could skip an arbitrary number of bytes, then look for a nearby line, but w/o fixed length lines, that wouldn't get you a specific line number. – ShadowRanger Feb 15 '17 at 03:26
Clearing the cache helps avoid holding all 30 files in memory, but it won't stop you from reading in the whole file in the first place, which can hurt if the file is huge. Mind you, I'm not saying this is a bad answer because of the scaling issue (I meant it when I said this was a good suggestion). It's the simplest code; if you'll never encounter files larger than a few MB, it's a great approach. Just wanted to make it clear that it can behave poorly given large inputs; the `linecache` docs don't mention that it slurps the whole file into cache on first access; could be a nasty surprise. – ShadowRanger Feb 15 '17 at 03:30

score 0 · Answer 3 · answered Feb 15 '17 at 02:45

I think something like this is what you want:

import glob

files = glob.glob('/Users/path/to/*/files.txt')             
for file in files:
    i = 0
    while i < 30:
        with open(file,'r') as f:
            for index, line in enumerate(f):
                if index == i:
                    print(line)
                    i += 1
                    break
        f.close()

Currently you are closing the file in the middle of the for loop and then trying to read it in again. So if you only close the file once you are out of the for loop it should be ok.

score 0 · Answer 4 · answered Feb 15 '17 at 03:47

Split your job into simpler steps, until the final step is trivial. Use functions.

Remember that a file object works as a sequence of lines.

def nth(n, sequence):
  for position, item in enumerate(sequence):
    if position == n:
      return item
  return None  # if the sequence ended before position n

def printNthLines(glob_pattern)
  # Note: sort file names; glob guarantees no order.
  filenames = sorted(glob.glob(glob_pattern))
  for position, filename in enumerate(filenames):
    with open(filename) as f:
      line = nth(position, f)  # Pick the n-th line.
      if line is not None:
        print(line)
      # IDK what to do if there's no n-th line in n-th file

printNthLines('path/to/*/file.txt')

Obviously we scan n-th file to n-th line, but this is inevitable, there's no way to get directly to n-th line in a plaintext file.

Print specific lines of multiple files in Python

4 Answers4