I have two files, the first file is a list of item with the items listed one per line. The second file is a tsv file with many items listed per line. So, some lines in the second file have items that might be listed in the first file. I need to generate a list of lines from the second file that might have items listed in the first file.
grep -f is being finicky for me so I decided to make my own python script. This is what I came up with:-
Big list is the second file, tiny list is the first file.
def main():
desired_subset = []
small_list = open('tiny_list.txt','r')
big_list = open('big_list.tsv','r')
for i in small_list.readlines():
i = i.rstrip('\n')
for big_line in big_list:
if i in big_line:
if i not in desired_subset:
desired_subset.append(big_line)
print(desired_subset)
print(len(desired_subset))
main()
The problem is that the for loop is only reading through the first line. Any suggestions?