As a native application I would not be surprised if findstr
has better search performance than most anything one could implement in PowerShell code or even a compiled .NET module. The problem with findstr
is it is oblivious to the structure of your data. That is, if you search for the record with ID 123
it will happily returns records with ID 1234
or address "123 Main Street"
as false positives. You could potentially use the /B
or /R
switches to combat this, but that still doesn't help in the case where you search for an ID that doesn't exist; findstr
only stops searching when it reaches the end of the file.
Your ability to perform an optimized search depends on the specific format of the text file. If lines are fixed-length, meaning you can instantly seek to the $n
th line by simply calculating $n * $lineLength
, then you could quickly search the file for an ID using a binary search.
If lines are variable-length, then there's really no simple way to efficiently search the file other than line-by-line. Even if you've read enough of a line to know the ID doesn't match, you still need to read the rest of the line to know where the next line begins. At best, since the lines are sorted by ID you know that if you encounter a line with an ID greater than the one you're searching for you can abort the search immediately because that ID won't be found.
In the past I have been able to employ a binary search on text files with variable-length lines (fixed-sized characters would be very helpful, too, if not required). The key is for each iteration of the search, calculate your next offset and if it happens to land on the beginning of a line, great; if not, seek backwards until you can identify the character that is the beginning of the line (e.g. preceded by a CrLf
). Once you've got yourself positioned on the start of a line, you can read the ID and determine if it's a match or in which direction the next iteration of the search needs to look.
It's definitely not a quick and simple solution (to write), but, depending on how huge is "huge", it could yield significant results when searching your file. Although, at that point it might be better to invest your development time in changing to a more search-friendly way of storing your data, if at all possible.