I have a cvs file of size 8-12 GB, and I would like to be able to search the first column of the file and retrieve the whole rows if there is a match. I would like to do the search for a set of more than 100K keys every time and retrieve the corresponding record for them.
There are a couple of approaches I can choose:
1) use a simple grep for each key in the file ==> 100K grep commands
2) make a SQL based database and index the first column then: a) search for each key by one select query. b) make a temporary table and insert all the keys to it and then do a set membership
3) make a hash function, such as Python dictionary and then search it by key. But I need to load it into the memory every time I need to do a bulk of queries (I don't want it to always occupy the memory)
I'm not sure which method is more efficient? Or any better options which I'm not aware of.