4

Goal (General) My ultimate (long term) goal is to write an importer for a binary file into another application

Question Background

  • I am interested in two fields within a binary file format. One is encrypted, and the other is compressed and possibly also encrypted (See how I arrived at this conclusion here).
  • I have a viewer program (I'll call it viewer.exe) which can open these files for viewing. I'm hoping this can offer up some clues.
  • I will (soon) have a correlated deciphered output to compare and have values to search for.
  • This is the most relevant stackoverflow Q/A I have found

Question Specific
What is the best strategy given the resources I have to identify the algorithm being used?

Current Ideas

  • I realize that without the key, identifying the algo from just data is practically impossible

  • Having a file and a viewer.exe, I must have the key somewhere. Whether it's public, private, symmetric etc...that would be nice to figure out.

  • I would like to disassemble the viewer.exe using OllyDbg with the findcrypt plugin as a first step. I'm just not proficient enough in this kind of thing to accomplish it yet.

Resources
full example file
extracted binary from the field I am interested in
decrypted data In this zip archive there is a binary list of floats representing x,y,z (model2.vertices) and a binary list of integers (model2.faces). I have also included an "stl" file which you can view with many free programs but because of the weird way the data is stored in STL's, this is not what we expect to come out of the original file.

Progress
1. I disassembled the program with Olly, then did the only thing I know how to do at this poing and "searched for all referenced text" after pausing the porgram right before it imports of of the files. Then I searched for words stings like "crypt, hash, AES, encrypt, SHA, etc etc." I came up with a bunch of things, most notably "Blowfish64" which seems to go nicely with the fact that mydata occasionally is 4 bytes too long (and since it is guranteed to be mod 12 = 0) this to me looks like padding for 64 bit block size (odd amounts of vertices result in non mod 8 amounts of bytes). I also found error messages like...

“Invalid data size, (Size-4) mod 8 must be 0"

After reading Igor's response below, here is the output from signsrch. I've updated this image with green dot's which cause no problems when replaced by int3, red if the program can't start, and orange if it fails when loading a file of interest. No dot means I haven't tested it yet.

Signsrch results annotated

Accessory Info

  • Im using windows 7 64 bit
  • viewer.exe is win32 x86 application
  • The data is base64 encoded as well as encrypted
  • The deciphered data is groups of 12 bytes representing 3 floats (x,y,z coordinates)
  • I have OllyDb v1.1 with the findcrypt plugin but my useage is limited to following along with this guys youtube videos
Community
  • 1
  • 1
patmo141
  • 321
  • 1
  • 3
  • 12
  • why aren't you asking this question on superuser? – President James K. Polk Feb 25 '12 at 21:09
  • IDA has a technique called FLIRT, that identifies known functions. If they use a standard crypto library, it might identify the algorithm. – CodesInChaos Feb 25 '12 at 21:10
  • @GregS Because I wasn't aware that is a better place to ask. I'll check that out asap – patmo141 Feb 25 '12 at 21:16
  • I'm not convinced that SuperUser is better than StackOverflow for this question. You might consider http://security.stackexchange.com/ too. I'm not sure that it is significantly better than SO, though. – Jonathan Leffler Feb 25 '12 at 21:29
  • @CodeInChaos Hmm, I have the freeware IDA, is FLIRT builtin or a plugin? I'll check that out too. – patmo141 Feb 25 '12 at 21:41
  • @JonathanLeffler: sounds more like a tools question than a programming question. – President James K. Polk Feb 26 '12 at 13:37
  • If the authors were _really_ sneaky, they'd hide important data by e.g. splitting it in two parts that need to be xor-ed, as to fool these signature checkers. @patmo141: try finding a file with a small number of vertices and facets like a cube. That would be easier to brute-force. – Roland Smith Feb 27 '12 at 23:06
  • Unfortunately that wont be possible because it's scanner data, so it will have a few thousand vertices at a minimum. – patmo141 Feb 27 '12 at 23:30
  • @patmo141: Is there anywhere we can download the viewer.exe app? – Leigh Feb 28 '12 at 17:00
  • @Leigh There is, it is publicly available. I just didn't want to rip open a company's file format on the web. Perhaps you can email me patrick dot moore dot bu @ gmail period com – patmo141 Feb 28 '12 at 21:23

2 Answers2

9

Many encryption algorithms use very specific constants to initialize the encryption state. You can check if the binary has them with a program like signsrch. If you get any plausible hits, open the file in IDA and search for the constants (Alt-B (binary search) would help here), then follow cross-references to try and identify the key(s) used.

Igor Skochinsky
  • 24,629
  • 2
  • 72
  • 109
  • Wow, that is pretty excellent. See the photo of the signsrch output. The following cross references part will be difficult for me as I'm still new with Olly, IDA etc. – patmo141 Feb 27 '12 at 21:38
  • Ok, I understand that signsrch has identified constants, but how do I search for them? For example, the Blowfish bfp table. 00856893 is that the actual constant or is that the location of the constant within the program? My apologies for the naive response – patmo141 Feb 28 '12 at 05:15
  • It seems to get the actual constants you can use the -L option. However, if those numbers are file offsets, you can use "Jump-Jump to file offset" to navigate there. – Igor Skochinsky Feb 28 '12 at 12:27
1

You can't differentiate good encryption (AES with XTS mode for example) from random data. It's not possible. Try using ent to compare /dev/urandom data and TrueCrypt volumes. There's no way to distinguish them from each other.

Edit: Re-reading your question. The best way to determine which symmetric algorithm, hash and mode is being used (when you have a decryption key) is to try them all. Brute-force the possible combinations and have some test to determine if you do successfully decrypt. This is how TrueCrypt mounts a volume. It does not know the algo beforehand so it tries all the possibilities and tests that the first few bytes decrypt to TRUE.

01100110
  • 2,294
  • 3
  • 23
  • 32
  • 1
    You're correct, but you didn't read the OP. "I realize that without the key, identifying the algo from just data is practically impossible". That's why he wants to figure out the algorithm with a disassembler. – CodesInChaos Feb 25 '12 at 21:12
  • Thanks. Option 1: Brute force with all reasonable methods once I find the key. – patmo141 Feb 25 '12 at 21:26