2

I have to look up a list of thousands of gene names (genelist.txt, one column) in a database file called (database.txt, multiple columns). Any lines containing at least one gene name that match the genelist.txt will be extract to output.txt.

I used to do it like this:

findstr /G:genelist.txt database.txt >output.txt

It works well and fast. However, I just found out today that the final output is affected by the gene order in the original genelist.txt. There is one result if using an unsorted gene list, and another result with more lines if sorting the gene list and searching again. But even with the sorted gene list the file output.txt does still not contain all lines as I'm missing some records. I just noticed this after a comparison with the result from

grep -f "genelist.txt" database.txt > output.txt

The results from grep have no difference no matter the gene list is sorted or not, but it's a bit slower than findstr.

I was wondering how this come. Can I add any arguments to findstr to make it return a complete results list?

Mofi
  • 46,139
  • 17
  • 80
  • 143
Nengchi Te
  • 61
  • 5
  • Related: [Why doesn't this FINDSTR example with multiple literal search strings find a match?](http://stackoverflow.com/a/8921279/3439404) by @dbenham. – JosefZ Mar 25 '16 at 20:04
  • Is it possible to create a [mcve]? I.e., can you reproduce the issue if database.txt contains only a few lines (one of which being a problematic one that is missed by findstr)? – Heinzi Mar 26 '16 at 08:38
  • Thank you very much for the post, it seems that some built-in bugs were in the win Findstr function @JosefZ – Nengchi Te Mar 30 '16 at 12:25
  • 1
    I have tried that, found the lines that Findstr missed, then searched again using the gene names within that line only, it worked well, but when within a long list, it missed some lines. The missing gene names had no clear pattern, so I guess it maybe some kind of memory error in Findstr, and from JosefZ's reply, some weird things do happen in Findstr. @Heinzi – Nengchi Te Mar 30 '16 at 12:44

0 Answers0