In my application I need to open a file, look for a tag and then do some operation based on that tag.
BUT! the file content alternates every char
with a /0
, so that the text "CODE" becomes 0x43 0x00 0x4F 0x00 0x44 0x00 0x45 0x00
(expressed in hex byte).
The issue is that the terminator is also a /0
, so the "CODE123" with the terminator would look something like this:
0x43 0x00 0x4F 0x00 0x44 0x00 0x45 0x00 0x31 0x00 0x32 0x00 0x33 0x00 0x00 0x00
Since /0
is the null string terminator, if I use File.ReadAllText()
i get only garbage,
so I tried using File.ReadAllBytes()
and then purging each byte equal to 0
.
This gets me readable text, but then I lose information on when the data ends, i.e. if in the file there was CODE123[terminator]PROP456[terminator]blablabla
I end up with CODE123PROP456blablabla.
So I decided to gets the file content as a byte[]
, and then look for another byte[]
initialized with the CODE-with-/0-inside data. This theoretically should work, but since the data array is fairly large (about 1.5 million elements) this takes way too long.
The final cherry on the cake is that I am looking for multiple occurences of the CODE tag, so I can't just go and stop as soon as I find it.
I tried modifying the LINQ posted as answer here: Find the first occurrence/starting index of the sub-array in C# as follows:
var indices = (from i in Enumerable.Range(0, 1 + x.Length - y.Length)
where x.Skip(i).Take(y.Length).SequenceEqual(y)
select (int?)i).ToList();
but as soon as I tried to enumerate the result it just hogs down.
So, my question is: how could I EFFICIENTLY find multiple subarrays in a large array? thanks