I have two csv's in which the rows can be matched by the value in one column (after some tweaking of this column). After the matching I want to take some values from both off them and make a new, combined row.
I thought of a simple script using csv.DictReader for both of them and then a double for-loop:
for row1 in csv1:
for row2 in csv2:
if row1['someID'] == row2['someID]:
newdict = ... etc
However, 1 file is 9 million rows and the other is 500k rows. So my code would take 4.5 * 10^12 iterations. Hence my question: what is a fast way to match them?
Important:
- This 'someID' on which they are matched is in neither csv unique.
- I want additional rows for every match. So if a 'someID' appears
twice in csv1 and 3 times csv2, I expect 6 rows with this 'someID' in the final result.