0

I am writing a program that reads '.exe' files and stores their hex values in an array of bytes for comparison with an array containing a series of values. (like a very simple virus scanner)

byte[] buffer = File.ReadAllBytes(currentDirectoryContents[j]);

I have then used BitConverter to create a single string of these values

string hex = BitConverter.ToString(buffer);

The next step is to search this string for a series of values(definitions) and return positive for a match. This is where I am running into problems. My definitions are hex values but created and saved in notepad as defintions.xyz

string[] definitions = File.ReadAllLines(@"C:\definitions.xyz");

I had been trying to read them into a string array and compare the definition elements of the array with string hex

bool[] test = new bool[currentDirectoryContents.Length];

test[j] = hex.Contains(definitions[i]);

This IS a section from a piece of homework, which is why I am not posting my entire code for the program. I had not used C# before last Friday so am most likely making silly mistakes at this point.

Any advice much appreciated :)

Yahia
  • 69,653
  • 9
  • 115
  • 144
  • 1
    Can you post the contents, or a section of, your definitions file please. – Myles McDonnell Feb 06 '12 at 13:12
  • And what's the problem exactly? – GazTheDestroyer Feb 06 '12 at 13:15
  • 3
    What is your question? Another important thing: converting a byte array to a string for a further hex-comparison is really not effective. You should compare bytes with bytes and **not** use strings here. – ken2k Feb 06 '12 at 13:15
  • You give a reasonable overview of what you are trying... but: it isn't clear where you have become stuck. What is happening? or not happening? (personally I agree with ken2k that I wouldn't use hex strings here... but: that is an implementation detail) – Marc Gravell Feb 06 '12 at 13:17
  • Myles: the definitions file contains a string of hex that I made up, as the .exe files to be scanned is not a true .exe, rather a notepad file that I saved with the exe extension. –  Feb 06 '12 at 13:18
  • Ken2k and Marc: This doesn't work for me as the program reads the definitions file as an array of strings, and the exe that is BitConverted is in HEX. The program doesn't set a == true to test[j] even when there should be a match –  Feb 06 '12 at 13:20
  • If I understand correctly,you're trying to find a specific sequence of bytes in a file. This question might be useful: http://stackoverflow.com/questions/1471975/best-way-to-find-position-in-the-stream-where-given-byte-sequence-starts – spender Feb 06 '12 at 13:21
  • If your definition file is a list of hexadecimal strings, then **parse those strings and convert them to bytes**. Don't converts bytes of your read file into hex strings, it's not effective. – ken2k Feb 06 '12 at 13:30

2 Answers2

1

It is pretty unclear exactly what kind of format you use of the definitions. Base64 is a good encoding for a byte[], you can rapidly convert back and forth with Convert.ToBase64String and Convert.FromBase64String(). But your question suggests the bytes are encoded in hex. Let's assume it looks like "01020304" for a new byte[] { 1, 2, 3, 4}. Then this helper function converts such a string back to a byte[]:

    static byte[] Hex2Bytes(string hex) {
        if (hex.Length % 2 != 0) throw new ArgumentException();
        var retval = new byte[hex.Length / 2];
        for (int ix = 0; ix < hex.Length; ix += 2) {
            retval[ix / 2] = byte.Parse(hex.Substring(ix, 2), System.Globalization.NumberStyles.HexNumber);                
        }
        return retval;
    }

You can now do a fast pattern search with an algorithm like Boyer-Moore.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • There is no specified format for the definitions in the file whatsoever. I have chosen to have each definition be a sequence of bits (in hexadecimal). This is from research into methods of virus scanning and deciding on signatures. For now to start my signature file has one definition in: A6 7C FD 1B 45 82 90 1D 6F 3C 8A OF 96 18 A4 C3 4F FF 0F 1D one .exe in the folder will contain this series of bytes because it will have been created in notepad and just saved as an exe. There is no specific file format for the definitions file. I have just been using a random extension for now. –  Feb 06 '12 at 14:50
  • Well, having a specified format is rather important. The code I posted ought to be close, expect you need to use 3 instead of 2 because of the spaces. – Hans Passant Feb 06 '12 at 14:55
  • Have solved now, thank-you for your time taken to advise me of a possible solution! –  Feb 08 '12 at 12:08
0

I expect you understand that this is a very inefficient way to do it. But except for that, you should just do something like this:

bool[] test = new bool[currentDirectoryContents.Length];
for(int i=0;i<test.Length;i++){
  byte[] buffer = File.ReadAllBytes(currentDirectoryContents[j]);
  string hex = BitConverter.ToString(buffer);
  test[i] = ContainsAny(hex, definitions);
}

bool ContainsAny(string s, string[] values){
  foreach(string value in values){
    if(s.Contains(value){
      return true;
    }
  }
  return false;
}

If you can use LINQ, you can do it like this:

var test = currentDirectoryContents.Select(
             file=>definitions.Any(
               definition => 
                 BitConverter.ToString(
                   File.ReadAllBytes(file)
                 ).Contains(definition)
             )
           ).ToArray();

Also, make sure that your definitions-file is formatted in a way that matches the output of BitConverter.ToString(): upper-case with dashes separating each encoded byte:

12-AB-F0-34
54-AC-FF-01-02 
Rasmus Faber
  • 48,631
  • 24
  • 141
  • 189
  • I do understand the inefficiency of the code, but I am not an experienced coder and am happy with writing working code before looking at improvements. I realised my entire code was working, but the code contained in the exe file I was reading was already in HEX. So my code was just converting each byte into a different HEX value. Now resolved. Thank-you all for your contributions and suggestions! –  Feb 08 '12 at 12:06