I have a 100GB text file. The data in that file is in this format:
email||username||password_hash
I am testing on a 6GB file which I made separately by splitting the bigger file.
I am running grep
to match the lines and output them.
I used
grep
. It is taking around 1 minute 22 secondsI used other options with
grep
, like,LC_ALL=C and -F
, but the time is reduced to1 minute 15 seconds
, which is still not good for a 6GB file.Then I used
ripgrep,
it is taking27 seconds
on my machine, still not good.Then I used
ripgrep with -F option
, it is taking14 seconds
, still not good.I tried
ag also (the silver searcher)
, but I found that it won't work for files bigger than 2 GB.
I need your help which command line tool (or language) to achieve better results, or some way I can take advantage of the format of data to search by column. Like if I am searching by username, then instead of matching the whole line, I search only on the second column. I tried that using awk
, but it is still slower.