I need to search a potentially very large structure of byte data (up to 4GB) for a given hex value. The issue is that the string input of hex values can be of any size including an odd number, e.g. "ABC". Instead of converting the byte data to string data and performing a string search (which I've done FWIW), I'm wondering about a possibly better performing algorithm that searches essentially nibble-by-nibble.
Given the size of source of data, the search may be performed in segments, of course. The output is ideally an enumerable of the starting indexes or addresses of the matches. For the purposes of this question, let's assume the data structure is a byte[]
and output is an int
index and a nibble offset (e.g. bool firstNibbleMatch
).
Before undertaking the task of trying it out, any ideas regarding profiling, the costs of the shift operations vs. ascii conversions (whether via C# library or roll-you-own unsafe code, etc.), or any other unforeseens? My main concern is performance. Would this approach even perform better in theory?