23

Update The original question is no longer the appropriate question for this problem, so I'm going to leave this alone to demonstrate what I tried/learned and for the background. It's clear that this is not just a "Base64 variation" and is a bit more involved.

Background: I program in python 3.x mainly for use with the open source program Blender. I'm a novice/amateur level programmer but I understand the big concepts fairly well I've read these articles relevant to my question.

Problem: I have a binary file which contains 3d mesh data (lists of floats and lists of integers) corresponding to x,y,z coordinates for each vertex (floats) and the indices of the vertices which make up the faces of the mesh (integers). The file is organized in an xml'ish kind of feeling...

<SomeFieldLabel and header like info>**thensomedatabetween**</SomeFieldLabel>

Here is the example from the "Vertices" field

<Vertices vertex_count="42816" base64_encoded_bytes="513792" check_value="4133547451">685506bytes of b64 encoded data
</Vertices>
  1. There are 685506 bytes of data between "Vertices" and "/Vertices"
  2. Those bytes only consist of a-a, A-Z, 0-9, and +,/ which is standard for base64
  3. When I grab those bytes, and use standard base64decode in python, I get 513792 bytes back out
  4. If vertex_count="42816" can be believed, there should be 42816*12bytes needed to represent x,y,z for each vertex. 42816*12 = 513792. excellent.
  5. Now, if I try and unpack my decoded bytes as 32bit floats, I get garbage...so something is ammis.

I'm thinking there is an extra cryptographic step somewhere. Perhaps there is a translation table, rotation cipher or some kind of stream cipher? It's strange that the number of bytes is correct but that the results are not which should limit the possibilities. Any ideas? Here are two example files with the file extension changed to *.mesh. I don't want to publicly out this file format, just want to write an importer for Blender so I can use the models.

Here are two example files. I have extracted the raw binary (not b64 decoded) from the Vertices and Facets fields as well as provided the bounding box information from a "Viewer" for this type of file provided by the company.
Example File 1

Example File 2

Notes About the Vertices field

  • The header specifies the vertex_count
  • The header specifies base64_encoded_bytes which is the # of bytes BEFORE base64 encoding takes place
  • The header specifies a "check_value" whose significance is yet to be determined
  • The data in the field only contains the standard base64 characters
  • After standard base64 decoding the output data has... length = vertex_count*12 = base64_encoded_bytes. Occasionally there are 4 extra bytes in the b64 output? -the ratio of encoded/decoded bytes is 4/3 which is also typical base64

Notes about the Facets field

  • The header specifies a facet_count
  • The header base64_encoded_bytes which is the # of bytes BEFORE base64 encoding takes place

  • The ratio of base64_encoded_bytes/facet_count seems to vary quite a bit. From 1.1 to about 1.2. We would expect a ratio of 12 if they were encoded as 3x4byte integers corresponding to the vertex indices. So either this field is compresesed or the model is saved with triangle strips, or both :-/

More Snooping
I opened up the viewer.exe (in a hex editor) which is provided by the company to view these files (also where I got the bounding box info). Here are some snippets which I found interesting and could further the search.

f_LicenseClient...Ì.@......m_wApplicationID.....@......f_bSiteEncryptionActive.....@......f_bSaveXXXXXXInternalEncrypted.....@......f_bLoadXXXXXXInternalEncrypted...¼!@......f_strSiteKey....í†......

In LoadXXXXXXInternalEncrypted and SaveXXXXXXInternalEncrypted I've blocked out the company name with XX. It looks like we definitely have some encryption beyond a simple base64 table variation.

SaveEncryptedModelToStream.................Self...pUx....Model...ˆÃC....Stream....

This to me looks like a function definition on how to save an encrypted model.

DefaultEncryptionMethod¼!@........ÿ.......€...€ÿÿ.DefaultEncryptionKey€–†....ÿ...ÿ.......€....ÿÿ.DefaultIncludeModelData –†....ÿ...ÿ.......€...€ÿÿ.DefaultVersion.@

Ahhh...now that is interesting. A default encryption key. Notice there are 27 bytes between each of those descriptors and they always end with "ÿÿ." Here is 24 bytes excluding "ÿÿ." To me, this is a 192 bit key...but who knows if all 24 of those bytes correspond to the key? Any thoughts?

80 96 86 00 18 00 00 FF 18 00 00 FF 01 00 00 00 00 00 00 80 01 00 00 00

Code Snippets
To save space in this thread, I put this script in my drop-box for download. It reads through the fiel, extracts basic info from the vertices and facets fields, and prints out a bunch of stuff. You can de-comment the end to have it save a data block into a separate file for easier analysis.
basic_mesh_read.py

This is the code I used to try all "reasonable" variations on the standard base64 library. try_all_b64_tables.py

Community
  • 1
  • 1
patmo141
  • 321
  • 1
  • 3
  • 12
  • 3
    Are you sure that the encoded values are 32 bit floats? If so, are they represented with [LSB](http://en.wikipedia.org/wiki/Least_significant_bit) or [MSB](http://en.wikipedia.org/wiki/Most_significant_bit)? – campos.ddc Feb 22 '12 at 22:11
  • I'm not completely sure, but I am fairly confident given the ratio of bytes to vertices. As far as LSB or MSB, those are new terms to me, so I'm investigating. It seems like this is the same as endianness but the Wiki article says it's not. So, I need to wrap my head around this a little more. I've tried unpacking both little endian and big endian. – patmo141 Feb 22 '12 at 22:58
  • It is the same as endianess, so at least that's out of the table – campos.ddc Feb 22 '12 at 23:13
  • Ok, so this is what I'm going to try next. My original searches didn't catch this [link]http://stackoverflow.com/questions/5537750/decode-base64-like-string-with-different-index-tables – patmo141 Feb 22 '12 at 23:39
  • 1
    It's not encrypted. Encrypted text looks like random bytes, and I see a lot of repetition. – Maarten Bodewes Feb 23 '12 at 00:12
  • @owlstead thanks for taking a look. Where are you seeing the repetition? I'm leaning toward that its using a non standard base64 translation table. I tried a few basic variations but they were just shots in the dark. I reckon there are 64 factorial variations on the translation table. – patmo141 Feb 23 '12 at 00:57
  • The last part of the base64 certainly looks repetative to me: ABAgAHjYQAAAEJAAEAAQIAAQEBAAAAAAICAAAAAAICAAEBAAEAAAEAAQABAgABAAABAgIAAAECAgIAAQIAAAABAAEAAQIAAAEAAAAAAAEAAgIAAQICAAABAgABAAEAAAABAgICAAACAgIAAAICAAAAAAAAAAIAAQEAAgABAgAAAQABAQIAAQIAAQIAAAACAgdqhAAAAQcjgwAAAgAAAAEAAAIAAgICAgAAAgABAgACAAAAAQACAAICAAAAAgIAAAAAAgAAAAAAAAACAA... That's a lot of A's :) – Maarten Bodewes Feb 23 '12 at 01:00
  • yeah, I haven't gotten to that part of the data yet. The sample you have provided is in the field which is going to have integers in the range [0, number of vertices]. Since the number of vertices is on the order of 50000 in this particular example, the last two bytes (or first two for big endian) are always going to be zero for every 4 byte integer. I guess I should play with the facet field while I marinate on the vertices some more. – patmo141 Feb 23 '12 at 01:29
  • 1
    Do you have any way of getting some known plain-text? If you knew the actual value of one of the vertices, your job would be much easier. – Rasmus Faber Feb 23 '12 at 11:46
  • 1
    That `check_value` does look suspicious. I tried xoring the data with it but it still results in nonsense values. So it's probably something a little more involved. – Igor Skochinsky Feb 23 '12 at 13:33
  • @Rasmus. I may be able to obtain a correlated file with an open mesh format. What I can do is obtain the bounding box size, and the min/max values in each coordinate using a "viewer" program. I will edit my original post with some information – patmo141 Feb 23 '12 at 15:10
  • @IgorSkochinsky I agree. I was thinking it might be a simple checksum for verification after the decode(crypt?) step. – patmo141 Feb 23 '12 at 15:12
  • This post appears to be growing a little large. Should I split off the Vertices field and the Facets field as separate questions to keep it a little easier to read? – patmo141 Feb 23 '12 at 18:19
  • I attempted to decode the b64 data with all the reasonable b64 tables I could think of without going for the full 64! tables. I simply permuted the uppercase, lowercase, digits and +/ as 4 blocks, and then shifted them one block at a time. So the 2nd script up there tries 6*64 different b64 tables. No dice :-( – patmo141 Feb 24 '12 at 20:41
  • eeek, I may be out of my league here. The 24 byte (192 bit) "Default Key" which came out of the viewer.exe and the fact that the vertices field tends to occasionally have an extra 4 bytes (which to me indicates some kind of block size in the encryption) – patmo141 Feb 25 '12 at 03:56
  • @patmo141: to save reversing the full encryption algorithm, you could hook the "load" method and snag the data after decryption. Alternatively if your viewer was conveniently written in some .NET language, you could use [.NET Reflector](http://www.reflector.net) to rip out the routines. – Leigh Feb 28 '12 at 16:52
  • @RasmusFaber I have acquired de-crypted version of the file. Link added above – patmo141 Mar 02 '12 at 22:12
  • Rather than guessing, why don't you contact the fellow who wrote the file and just ask him how it is encoded? – Ira Baxter Mar 02 '12 at 22:26
  • It's a large company. I've sent an email, but haven't heard much back – patmo141 Mar 02 '12 at 22:46
  • Please rename the question. Currently it is strictly confusing when scanning question list. – Netch Apr 03 '12 at 06:35

1 Answers1

1

I am not sure why you think the results are not floating point numbers. The vertices data in the "decrypted data" you gave, contains as first 4 bytes "f2 01 31 41". Given an LSB byte order, that corresponds to the bit pattern "413101f2", which is the IEEE 754 representation of the float value 11.062973. All the 4 byte values in that file are in that same range, so I assume they all are float values.

fishinear
  • 6,101
  • 3
  • 36
  • 84
  • That is correct, the "decrypted data" is 4byte floats. This particular data was decrypted by a 3rd party application. So, the data in the original file goes from raw floats -> encrypted -> base64encoded. My goal is to go in reverse. Encrypted -> raw floats which I currently haven't been able to do. When I first wrote this thread, I just thought it was a Base64 variation, but more investigation led to the conclusion that it is also encrypted. Apologies for the confusion. – patmo141 Mar 05 '12 at 21:18
  • 1
    Perhaps the "encryted data" is really just a stream of floats in _binary_? – jpaugh May 17 '12 at 01:58