I have a large file (5Gb) called my_file
. I have a list called my_list
. What is the most efficient way to read each line in the file and, if an item from my_list
matches an item from a line in my_file
, create a new list called matches
that contains items from the lines in my_file
AND items from my_list
where a match occurred. Here is what I am trying to do:
def calc(my_file, my_list)
matches = []
my_file.seek(0,0)
for i in my_file:
i = list(i.rstrip('\n').split('\t'))
for v in my_list:
if v[1] == i[2]:
item = v[0], i[1], i[3]
matches.append(item)
return matches
here are some lines in my_file
:
lion 4 blue ch3
sheep 1 red pq2
frog 9 green xd7
donkey 2 aqua zr8
here are some items in my_list
intel yellow
amd green
msi aqua
The desired output, a list of lists, in the above example would be:
[['amd', 9, 'xd7'], ['msi', 2, 'zr8']]
My code is currently work, albeit really slow. Would using a generator or serialization help? Thanks.