I have some questions about speed of Python. I have two lists of lists with data, which look like this:
GCA_NUMBER.VERSION name sth_else etc. (FILE A - 170k lines)
GCF_NUMBER.VERSION name sth_else etc. (FILE B - 450k lines)
The goal is to eliminate duplicates from file A which occur in file B eg.:
GCA_0000025.1
GCF_0000025.5
I only care about part with NUMBER, but I cannot loose other informations like name.
I tried two approaches:
for i in FILE_A:
for j in FILE_B:
if i[0] == j[0]:then sth
which took about 17 minutes and second:
tmp_lst = [i[0] for i in FILE_B]
for i in FILE_A:
if i not in tmp_lst: then sth
which took about 13 minutes. Is there a faster way?