I have to look up a list of thousands of gene names (genelist.txt
, one column) in a database file called (database.txt
, multiple columns). Any lines containing at least one gene name that match the genelist.txt
will be extract to output.txt
.
I used to do it like this:
findstr /G:genelist.txt database.txt >output.txt
It works well and fast. However, I just found out today that the final output is affected by the gene order in the original genelist.txt
. There is one result if using an unsorted gene list, and another result with more lines if sorting the gene list and searching again. But even with the sorted gene list the file output.txt
does still not contain all lines as I'm missing some records. I just noticed this after a comparison with the result from
grep -f "genelist.txt" database.txt > output.txt
The results from grep have no difference no matter the gene list is sorted or not, but it's a bit slower than findstr.
I was wondering how this come. Can I add any arguments to findstr to make it return a complete results list?