I have a large csv file (~250M lines) with the following structure
ID1, ID2, value
A, B, 5
B,C, 8
C,B, 4
I want to get a table which tells me if the pair (ID1,ID2) is reciprocated in the file. So the output should be something like:
ID1, ID2, Reciprocity
A,B,0
B,C,1
C,B,1
I would do this by creating a dictionary and checking if the key ID2+ID1 is in the dictionary, but the dictionary becomes larger than my RAM. I've tried using networkx but can't create the graph because I also run out of RAM.
What is an option that doesn't require loading the whole file into the RAM but is also not prohibitively long in terms of reading from the hard drive in a loop?