Given two files file1.txt
abc def \t 123 456
jkl mno \t 987 654
foo bar \t 789 123
bar bar \t 432
and file2.txt
foo bar \t hello world
abc def \t good morning
xyz \t 456
The task is to extract the lines where the first column matches and achieve:
abc def \t 123 456 \t good morning
foo bar \t 789 123 \t hello world
I can do it in Python as such:
from io import StringIO
file1 = """abc def \t 123 456
jkl mno \t 987 654
foo bar \t 789 123
bar bar \t 432"""
file2 = """foo bar \t hello world
abc def \t good morning
xyz \t 456"""
map1, map2 = {}, {}
with StringIO(file1) as fin1:
for line in file1.split('\n'):
one, two = line.strip().split('\t')
map1[one] = two
with StringIO(file2) as fin2:
for line in file2.split('\n'):
one, two = line.strip().split('\t')
map2[one] = two
for k in set(map1).intersection(set(map2)):
print('\t'.join([k, map1[k], map2[k]]))
The actual task files have billions of lines, are there faster solution without loading everything and keeping the hashmaps/dictionaries?
Maybe using unix/bash commands? Would pre-sorting the files help?