The key
keyword must be a callable. It is called for every entry in the input sequence.
A lambda
is an easy way to create such a callable:
sorted(..., key=lambda line: stringsplit(line))
I would be extremely wary of sorting the output of fileinput
with many, large files though. sorted()
must read all lines into memory to be able to sort them. If your files are many and / or large, you'll use up all memory, eventually leading to a MemoryError
exception.
Use a different method to pre-sort your logs first. You can use a the UNIX tool sort
, or use a external sorting technique instead.
If your input files are already sorted, you can merge them using the same key:
import operator
def mergeiter(*iterables, **kwargs):
"""Given a set of sorted iterables, yield the next value in merged order"""
iterables = [iter(it) for it in iterables]
iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}
if 'key' not in kwargs:
key = operator.itemgetter(0)
else:
key = lambda item, key=kwargs['key']: key(item[0])
while True:
value, i, it = min(iterables.values(), key=key)
yield value
try:
iterables[i][0] = next(it)
except StopIteration:
del iterables[i]
if not iterables:
raise
then pass in your open file objects:
files = [open(f) for f in glob.glob('logs/*')]
for line in mergeiter(*files, key=lambda line: stringsplit(line)):
# lines are looped over in merged order.
but you need to make certain that the stringsplit()
function returns values as they are ordered in the input log files.