I have a small script which compares, from CSV input files, how many items of the first list are in the second list. However, it takes a certain time to run when there is many references.
data_1 = import_csv("test1.csv")
print(len(data_1))
data_2 = import_csv("test2.csv")
print(len(data_2))
data_to_keep = len([i for i in data_1 if i in data_2])
I just run a test with 598756 items for the first list and 76612 for the second, and the script hasn't finished yet.
As I'm still relatively new to Python, I would like to know if there is a fastest way to achieve what I'm trying to do. Thank you for your help :)
EDIT : import CSV looks like this :
def import_csv(csvfilename):
data = []
with open(csvfilename, "r", encoding="utf-8", errors="ignore") as scraped:
reader = csv.reader(scraped, delimiter=',')
for row in reader:
if row: # avoid blank lines
data.append(row[0])
return data