I have a very large file (100 MB) with strings in it, and I am searching a performant way to query if a given string is available in the file. The whole line should be compared against the input string.
My idea is that a program loads the file, and after that, it can be queried if the string exists or not.
void loadfile(string filename);
bool stringAvailable(string str);
The loadfile() function does not need to be fast, since it is called occasionally. But stringAvailable() needs to be as performant as possible.
At the moment I have tried:
1. Let the linux command line tools do the job for me:
system("cat lookup | grep \"^example$\"");
=> Not very fast.
2. Having a MySQL database instead of a file (I tried MyISAM and InnoDB) and query it like SELECT count(*) FROM lookup WHERE str = 'xyz'
=> Very fast, but it could be still faster. Also, it would be better to have a program which is not dependent on a DBMS.
3. Having an array of strings (string[] ary
) and compare all values in a for() loop.
=> Not very fast. I guess it can be optimized with hashtables, which I am currently experimenting.
Are there other possibilities to make the process even more performant?