3

I have an in-memory byte[] and need to locate the offset where 13 and 10 are. I will then use the following to extract that line:

 String oneLine = Encoding.ASCII.GetString(bytes, 0, max);

What is the fastest way to search for the two bytes on a x64 bit computer? ..and convert it to a string?

Is there anything I can do other than iterating through each byte, scanning for 13 and then scan for 10?

// Disclaimer: 
// This is just for my curiosity.  Perhaps I'll gain a better understanding of
// how .NET interfaces with RAM, the CPU instructions related to comparisons, etc.
//  
// I don't suspect a performance problem, but I do suspect a lack of understanding
// (on my part) on how C# does low-level operations.
makerofthings7
  • 60,103
  • 53
  • 215
  • 448
  • 8
    Maybe there is - but have you measured and found that the simple approach isn't *fast enough*? – Jon Skeet Feb 19 '13 at 21:03
  • 1
    would it be faster to conver to a string first and do an `yourString.IndexOf(Environment.NewLine)`? Genuinely curious... – DiskJunky Feb 19 '13 at 21:07
  • 1
    @PaulSasik Well, often a constant factor speed-up can be gained by doing the same thing in a more cache- and CPU-friendly manner. Sometimes that constant factor is significant. However, OP and all other people asking for the fastest way better be aware that this is often a waste of time and the wrong question to ask to begin with. So, OP: Why do you even care? Curiosity (nothing wrong with that)? A gut feeling that the obvious way is too slow? Well-founded (y'know, with profiling and benchmarking) *knowledge* that it's slower than it should be? Only the middle one is a bad reason. –  Feb 19 '13 at 21:09
  • Fastest way ever - create a unmanaged dll and use SCASW - http://tic01.tic.ec-lyon.fr/~muller/trotek/cours/8086/SCASW.html.en – Otávio Décio Feb 19 '13 at 21:10
  • Of course there is something else: fix the array, cast the pointer to `ulong*`, and for each 8-byte block use SWAR to test whether it contains a 13. If it does, scan from the beginning of that block for 13,10, and well you can make the rest up. Whether that's faster, well, who knows. But it's something you could try, if necessary. – harold Feb 19 '13 at 21:11
  • @OtávioDécio what if the 13 is at an odd index? – harold Feb 19 '13 at 21:12
  • @harold - if memory serves me (it's been a while) it shouldn't matter. – Otávio Décio Feb 19 '13 at 21:13
  • @OtávioDécio `scasw` increments the index by 2, so I don't really see how that would work out.. – harold Feb 19 '13 at 21:15
  • @harold - that's what I always used to search a word in a large string, it used to work in the 8086 architecture at the time. – Otávio Décio Feb 19 '13 at 21:19

2 Answers2

1

Not sure if it will be 'the fastest way' but you can look at Boyer-Moore algorithm to find the indexes of the required values.

Have a look at this SO thread Search longest pattern in byte array in C#

Boyer-Moore would be better than a linear array traversal because it can skip elements depending on the length of you 'needle' and it gets better as the 'haystack' gets bigger. HTH.

Community
  • 1
  • 1
Raghu
  • 2,678
  • 2
  • 31
  • 38
1

Since you are looking for a two byte sequence, you don't have to scan every byte, just every other one. If the target index contains a 13, then look at the next byte for a 10. If the target index contatins a 10, look at the previous byte for a 13. That should cut your scan time approximatly in half over a linear search.

user957902
  • 3,010
  • 14
  • 18