1

I have multiple big files and need to yield them round-robin style line by line. Something like this pseudo code:

    def get(self):
        with open(file_list, "r") as files:
            for file in files:
                yield file.readline()

How would I do this?

Z4-tier
  • 7,287
  • 3
  • 26
  • 42
Harald Thomson
  • 730
  • 2
  • 8
  • 18
  • Found a possible duplicate: https://stackoverflow.com/questions/4617034/how-can-i-open-multiple-files-using-with-open-in-python?rq=1 – Harald Thomson Feb 07 '20 at 10:04

2 Answers2

1

The itertools documentation has several recipes, among them a very neat round-robin recipe. I would also use ExitStack to work with multiple file context-managers:

from itertools import cycle, islice
from contextlib import ExitStack

# https://docs.python.org/3.8/library/itertools.html#itertools-recipes
def roundrobin(*iterables):
    "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
    # Recipe credited to George Sakkis
    num_active = len(iterables)
    nexts = cycle(iter(it).__next__ for it in iterables)
    while num_active:
        try:
            for next in nexts:
                yield next()
        except StopIteration:
            # Remove the iterator we just exhausted from the cycle.
            num_active -= 1
            nexts = cycle(islice(nexts, num_active))

...

def get(self):
    with open(files_list) as fl:
        filenames = [x.strip() for x in fl]
    with ExitStack() as stack:
        files = [stack.enter_context(open(fname)) for fname in filenames]
        yield from roundrobin(*files)

Although, perhaps the best design is to use inversion of control, and provide the sequence of file-objects as an argument to .get, so the calling code should take care of using an exit-stack:

class Foo:
    ...
    def get(self, files):
        yield from roundrobin(*files)

# calling code:
foo = Foo() # or however it is initialized

with open(files_list) as fl:
    filenames = [x.strip() for x in fl]
with ExitStack() as stack:
    files = [stack.enter_context(open(fname)) for fname in filenames]
    for line in foo.get(files):
        do_something_with_line(line)
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

This would be tricky (or require some additional libraries) to do using a context manager, but it shouldn't be very difficult without. open() takes the name of a single file, so assuming file_list is a list of strings naming the input files, this should work:

def get(files_list):
  file_handles = [open(f, 'r') for f in files_list]
  while file_handles:
    for fd in file_handles:
      line = fd.readline()
      if line:
        yield line
      else:
        file_handles.remove(fd)

I'm assuming you want to keep going until every line is read from every file, with shorter files dropping off as they hit EOF.

Z4-tier
  • 7,287
  • 3
  • 26
  • 42