1

I want to read a file's content and find hex matches in the data. I feel like using "file.readallbytes" is overkill because I just need to read byte by byte until I find a hex match. Is there a better alternative I can use instead or is better for performance to use readallbytes? What I'm doing below currently works as is.

The file I am attempting to read is a simple text file, it has "hello" in it.

string match = "68656C6C6F";

foreach (var jsfile in jsscan)
{
    byte[] data = File.ReadAllBytes(jsfile);
    string dataString = String.Concat(data.Select(b => b.ToString("X2")));
    if (dataString.Contains (match))
    {
        MessageBox.Show(jsfile + dataString);
    }
}

Updated solution thanks to mfatih:

public void example()
{

    string match = "68656C6C6F"; //This is "hello" in hex
    byte[] matchBytes = StringToByteArray(match);


    foreach (var jsFile in jsscan)
    {
        using (var fs = new FileStream(jsFile, FileMode.Open))
        {
            int i = 0;
            int readByte;
            while ((readByte = fs.ReadByte()) != -1)
            {
                if (matchBytes[i] == readByte)
                {
                    i++;
                }
                else
                {
                    i = 0;
                }
                if (i == matchBytes.Length)
                {
                    Console.WriteLine("It found between {0} and {1}.", 
                       fs.Position - matchBytes.Length, fs.Position);
                    break;
                }
            }
       }
    }
}
public static byte[] StringToByteArray(String hex)
{
    int NumberChars = hex.Length;
    byte[] bytes = new byte[NumberChars / 2];
    for (int i = 0; i < NumberChars; i += 2)
            bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
    return bytes;
 }
DropItLikeItsHot
  • 139
  • 1
  • 14
  • 1
    Have you consider converting the input hex numbers to string and performs the comparison? – sujith karivelil Jan 04 '17 at 04:09
  • Hey un-lucky! Yes, I believe that's what I'm doing right now right? I am converting the readallbytes to hex format for each file then using "Contains" to match the hex as a string. I'm just wondering if there's a more efficient way without reading into the whole file? I don't want to start at a specific offset either, just the entire file in general. – DropItLikeItsHot Jan 04 '17 at 04:11
  • Possible duplicate of [Best way to find position in the Stream where given byte sequence starts](http://stackoverflow.com/questions/1471975/best-way-to-find-position-in-the-stream-where-given-byte-sequence-starts) – Maksim Simkin Jan 04 '17 at 06:47
  • Hi this is great! Can please let me know it is finding the position of that string now how can I get string from this location to my desired length ? – CodyMan May 29 '18 at 09:11

1 Answers1

1

There's a more efficient way without reading into the whole file. I hope this way can help you.

string match = "68656C6C6F";

byte[] matchBytes = Encoding.ASCII.GetBytes(match);

foreach (var jsFile in jsscan)
{
    using (var fs = new FileStream(jsFile, FileMode.Open))
    {
        int i = 0;
        int readByte;
        while ((readByte = fs.ReadByte()) != -1)
        {
            if (matchBytes[i] == readByte)
            {
                i++;
            }
            else
            {
                i = 0;
            }
            if (i == matchBytes.Length)
            {
                Console.WriteLine("It found between {0} and {1}.", 
                       fs.Position - matchBytes.Length, fs.Position);
                break;
            }
        }
   }
}
Maksim Simkin
  • 9,561
  • 4
  • 36
  • 49
mfatih
  • 476
  • 4
  • 10
  • This looks solid, but I am not able to get it to work when reading files into file stream. By "I can't get it to work" I mean it won't show the console output for the bytes being found. I have made sure it's looking at the right file too. – DropItLikeItsHot Jan 04 '17 at 07:35
  • Okay I've debugged it, and it looks like it's failing at if (matchBytes[i] == readByte). I believe "byte[] matchBytes = Encoding.ASCII.GetBytes(match);" is incorrectly implemented. It shows matchBytes as being null. Any ideas? – DropItLikeItsHot Jan 04 '17 at 08:03
  • Just confirmed the GetBytes function isn't working correctly by replacing that with "byte[] matchBytes = {104,101,108,108,111};" and it worked as expected. I'll do some more research and update my post if I find the cause. – DropItLikeItsHot Jan 04 '17 at 08:11
  • Make sure you add `using System.Text;` or you can try `byte[] matchBytes = System.Text.Encoding.ASCII.GetBytes(match);`. – mfatih Jan 04 '17 at 08:40
  • Yes, I have the code identical to what you created in your answer, it still doesn't read bytes from string as expected? – DropItLikeItsHot Jan 04 '17 at 16:21
  • The string I mentioned is in Hex, but you're using ASCII.GetBytes. I am building something to convert the Hex string right now to a byte array to solve this issue. – DropItLikeItsHot Jan 04 '17 at 23:17
  • Updated solution, working flawlessly. Thank you for your help mfatih! – DropItLikeItsHot Jan 05 '17 at 00:12
  • 1
    This link can give you more effective conversion options for StringToByteArray http://stackoverflow.com/questions/472906/how-to-get-a-consistent-byte-representation-of-strings-in-c-sharp-without-manual – mfatih Jan 05 '17 at 05:46
  • @mfatih Hi this is great! Can please let me know it is finding the position of that string now how can I get string from this location to my desired length ? – CodyMan May 29 '18 at 09:13