12

I am a fan of the outmoded game Age of Empires II(AoE). I want to write a parser of AoE game record(.mgx files) using Python.

I did some searching on GitHub and found little projects on this, the most useful one is aoc-mgx-format which provide some details of .mgx game record files.

Here is the problem:

according to the reference, structure of a .mgx file is like:

| header_len(4byte int) | next_pos(4byte int) | header_data | ... ... |

The hex data's byte order in mgx format is little endian.

header_len stores data length of the Header part(header_len + next_post + header_data)

header_data stores useful imformation i need, but its compressed with zlib

I tried to decompress data in header_data with zlib module like this:

import struct
import zlib

with open('test.mgx', "rb") as fp:
    # read the header_len bytes and covert it to a int reprents length of Header part
    header_len = struct.unpack("<i", fp.read(4))[0]

    # read next_pos (this is not important for me)
    next_pos = struct.unpack("<i", fp.read(4))[0]

    # then I can get data length of header_data part(compressed with zlib)
    header_data_len = header_len - 8

    compressed_data = fp.read(header_data_len)[::-1] # need to be reversed because byte order is little endian?

    try:
        zlib.decompress(compressed_data)
        print "can be decompressed!"
    except zlib.error as e:
        print e.message

but I got this after running the program:

Error -3 while decompressing data: incorrect header check

PS: Sample .mgx files can be found here: https://github.com/stefan-kolb/aoc-mgx-format/tree/master/parser/recs

vvvvv
  • 25,404
  • 19
  • 49
  • 81
lichifeng
  • 123
  • 8
  • Data don't need to be reversed because the byte-order is little-endian. You already converted them from little-endian to native by using `" – abarnert Apr 17 '15 at 05:10
  • 5
    There is a typo in your question, where you say "outmoded game Age of Empires", I think you mean "wonderfully awesome game Age of Empires". –  Apr 17 '15 at 05:11
  • Anyway, when you fix that problem (by removing the `[::-1]`, that fixes that error, and instead gives you the correct error -3, complaining that EC BD doesn't look like a valid compression method. Since you're usually going to see 79 9C or 79 DA at the start of a valid zlib compressed blob, it may be worth scanning the file for those bytes… – abarnert Apr 17 '15 at 05:20
  • @abarnert thx. i used struct.unpack() only on the first 8 bytes. For **header_data**, I think it needs to be reversed before zlib.decompress(). I tried not reversing it, but still the same problem. – lichifeng Apr 17 '15 at 05:20
  • Why do you think it needs to be reversed? That would be very unusual (and the older the format, the more unusual, because it would be inefficient…), and the reverse-engineered-spec you linked to just says "need to uncompress. (zlib deflate compress)", nothing about reversing it. – abarnert Apr 17 '15 at 05:21
  • Hold on, maybe it's zlib without a zlib header (as in gzip). Let me try something. – abarnert Apr 17 '15 at 05:24
  • 1
    @abarnert you are great!!! i googled with "zlib without a zlib" and found some useful! `zlib.decompress(compressed_data, -zlib.MAX_WBITS)` will work – lichifeng Apr 17 '15 at 05:38
  • @lichifeng: Ah, I thought you could only suppress the header by passing -wbits to a decompressor object, not to the `decompress` method too. That's even simpler. :) – abarnert Apr 17 '15 at 05:40
  • 1
    @abarnert certainly i will use decompressor object in real project, its just test code above. thanks again, i guess you have played this game, too XDDD – lichifeng Apr 17 '15 at 05:44
  • 1
    I think that last comment was for @LegoStormtroopr, not me. :) I have played it, but not for a long time. I like Europa Universalis and Crusader Kings for my strategy fix, so my questions are about writing an iterative parser for human-readable-text-but-300MB files. :) – abarnert Apr 17 '15 at 05:48

2 Answers2

5

Your first problem is that you shouldn't be reversing the data; just get rid of the [::-1].

But if you do that, instead of getting that error -3, you get a different error -3, usually about an unknown compression method.

The problem is that this is headerless zlib data, much like what gzip uses. In theory, this means the information about the compression method, window, start dict, etc. has to be supplied somewhere else in the file (in gzip's case, by information in the gzip header). But in practice, everyone uses deflate with the max window size and no start dict, so if I were designing a compact format for a game back in the days when every byte counted, I'd just hardcode them. (In modern times, exactly that has been standardized in an RFC as "DEFLATE Compressed Data Format", but most 90s PC games weren't following RFCs by design...)

So:

>>> uncompressed_data = zlib.decompress(compressed_data, -zlib.MAX_WBITS)
>>> uncompressed_data[:8] # version
b'VER 9.8\x00'
>>> uncompressed_data[8:12] # unknown_const
b'\xf6(<A'

So, it not only decompressed, that looks like a version and… well, I guess anything looks like an unknown constant, but it's the same unknown constant in the spec, so I think we're good.

As the decompress docs explain, MAX_WBITS is the default/most common window size (and the only size used by what's usually called "zlib deflate" as opposed to "zlib"), and passing a negative value means that the header is suppressed; the other arguments we can leave to defaults.

See also this answer, the Advanced Functions section in the zlib docs, and RFC 1951. (Thanks to the OP for finding the links.)

Community
  • 1
  • 1
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    thx a lot, i found this with some keywords you provided, also useful [http://stackoverflow.com/a/22311297/4799491](http://stackoverflow.com/a/22311297/4799491) – lichifeng Apr 17 '15 at 05:41
  • @lichifeng: I added the links to the answer. Nice find. – abarnert Apr 17 '15 at 05:46
3

Old but here is a sample of what I did :

class GameRecordParser:

def __init__(self, filename):
    self.filename = filename
    f = open(filename, 'rb')

    # Get header size
    header_size = struct.unpack('<I', f.read(4))[0]
    sub = struct.unpack('<I', f.read(4))[0]
    if sub != 0 and sub < os.stat(filename).st_size:
        f.seek(4)
        self.header_start = 4
    else:
        self.header_start = 8

    # Get and decompress header
    header = f.read(header_size - self.header_start)
    self.header_data = zlib.decompress(header, -zlib.MAX_WBITS)

    # Get body
    self.body = f.read()
    f.close()

    # Get players data
    sep = b'\x04\x00\x00\x00Gaia'
    pos = self.header_data.find(sep) + len(sep)
    players = []
    for k in range(0, 8):
        id = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        type = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name_size = struct.unpack('<I', self.header_data[pos:pos+4])[0]
        pos += 4
        name = self.header_data[pos:pos+name_size].decode('utf-8')
        pos += name_size
        if id < 9:
            players.append(Player(id, type, name))

Hope it helps future programmer :)

By the wway I am planning on writting such a library.

Victor Drouin
  • 597
  • 2
  • 15