1

I'm attempting to merge a group of pre-sorted files where every line in each file is an integer:

for line in heapq.merge(*files):

The sort completes successfully but the comparison is done on the file contents as strings, not integers. How can I force the integer comparison?

RedLeader
  • 657
  • 1
  • 15
  • 28

2 Answers2

1

Try this:

for line in heapq.merge(*(map(int, file) for file in files)):

That doesn't interpret the strings as integers during comparison, but actually on-the-fly changes them to integers. The outcome is therefore integers, not strings. Can of course then be converted back to strings if desirable:

for line in map(str, heapq.merge(*(map(int, file) for file in files))):

For others / future reference: This is for Python 3, where map returns an iterator. In Python 2, map would need to be replaced by itertools.imap in order to not read everything into memory at startup.

Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
0

Try reading the files and converting each line into an integer. This assumes that all data fit into memory.

def read_as_int_list(file_name):
    with open(file_name) as fobj:
        return [int(line) for line in fobj]

This should be more memory efficient:

def read_as_ints(file_name):
    with open(file_name) as fobj:
        for line in fobj:
            yield int(line)

Usage:

files = (read_as_ints(name) for name in list_of_file_names)
for line in heapq.merge(*files):
    print(line)
Mike Müller
  • 82,630
  • 20
  • 166
  • 161
  • Should have added that condition. Cannot store the values in memory and that's why I'm trying to use this implementation to read the top of each input stream on the fly. – RedLeader May 09 '15 at 22:15
  • @RedLeader Added a version that should work with large files. Let me know if it works for you. – Mike Müller May 09 '15 at 22:31