2

So I made a short Python script to launch files in Windows with ambiguous extensions by examining their magic number/file signature first:

I'd like to compile it to a .exe to make association easier (either using bbfreeze or rewriting in C), but I need some kind of user-friendly config file to specify the matching byte strings and program paths. Basically I want to put this information into a plain text file somehow:

magic_numbers = {
# TINA
'OBSS': r'%PROGRAMFILES(X86)%\DesignSoft\Tina 9 - TI\TINA.EXE',

# PSpice
'*version': r'%PROGRAMFILES(X86)%\Orcad\Capture\Capture.exe', 
'x100\x88\xce\xcf\xcfOrCAD ': '', #PSpice?

# Protel
'DProtel': r'%PROGRAMFILES(X86)%\Altium Designer S09 Viewer\dxp.exe', 

# Eagle
'\x10\x80': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'\x10\x00': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE eagle ': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',

# PADS Logic
'\x00\xFE': r'C:\MentorGraphics\9.3PADS\SDD_HOME\Programs\powerlogic.exe', 
}

(The hex bytes are just arbitrary bytes, not Unicode characters.)

I guess a .py file in this format works, but I have to leave it uncompiled and somehow still import it into the compiled file, and there's still a bunch of extraneous content like { and , to be confused by/screw up.

I looked at YAML, and it would be great except that it requires base64-encoding binary stuff first, which isn't really what I want. I'd prefer the config file to contain hex representations of the bytes. But also ASCII representations, if that's all the file signature is. And maybe also regexes. :D (In case the XML-based format can be written with different amounts of whitespace, for instance)

Any ideas?

Community
  • 1
  • 1
endolith
  • 25,479
  • 34
  • 128
  • 192
  • have you looked at [python-magic](https://github.com/ahupp/python-magic)? – jterrace Mar 13 '12 at 16:18
  • @jterrace: No, but 1. It's probably overkill if you're only dealing with one file extension? 2. libmagic probably doesn't recognize the formats I care about anyway? [Their file format](http://linux.die.net/man/5/magic) seems relevant, though. I didn't realize they handle regexes, too. – endolith Mar 13 '12 at 16:27
  • what's nice about libmagic though is that you can add your own custom formats, so you could easily extend it with yours – jterrace Mar 13 '12 at 16:38
  • @jterrace: Yeah, confirmed that the [file command](http://linux.die.net/man/1/file) just says that these are "data" and doesn't know any more than that. It's possible to add custom formats, but then I'm just accomplishing the same thing as I've already accomplished. – endolith Mar 13 '12 at 16:45
  • Yeah, I think I would decouple the two config files: one for associating magic numbers to mimetypes and another for associating mimetypes with launcher paths – jterrace Mar 13 '12 at 17:04
  • @jterrace: Do these files even have mimetypes? – endolith Mar 13 '12 at 17:11
  • That's what libmagic gives you. If the files don't have official mine types, people usually just make one up – jterrace Mar 13 '12 at 17:13
  • @jterrace: Can you suggest that as an answer? – endolith Mar 13 '12 at 19:41

3 Answers3

1

You've already got your answer: YAML.

The data you posted up above is storing text representations of binary data; that will be fine for YAML, you just need to parse it properly. Usually you'd use something from the binascii module; in this case, likely the binascii.a2b_qp function.

magic_id_str = 'x100\x88\xce\xcf\xcfOrCAD '
magic_id = binascii.a2b_qp(magic_id_str)

To elucidate, I will use a unicode character as an easy way to paste binary data into the REPL (Python 2.7):

>>> a = 'Φ'  
>>> a  
'\xce\xa6'  
>>> binascii.b2a_qp(a)  
'=CE=A6'  
>>> magic_text = yaml.load("""  
... magic_string: '=CE=A6'  
... """)  
>>> magic_text  
{'magic_string': '=CE=A6'}  
>>> binascii.a2b_qp(magic_text['magic_string'])  
'\xce\xa6'  
Peter V
  • 613
  • 4
  • 11
  • Wait, so how would you write this in the config file and use it with `yaml.load`? Now I'm thinking the format could be slightly more complex, with a specifier for the type of data, and you could enter either something like `[string, DProtel]` or `[hex, 88 ce cf cf]` and the program would handle each differently. – endolith Mar 13 '12 at 18:23
  • 1
    I've got this working implicitly now with `yaml.add_implicit_resolver`. Anything in the form `88 ce cf c4` is converted to binary, no tags needed. Anything else is interpreted as a string. – endolith Mar 14 '12 at 01:00
1

I would suggest doing this a little differently. I would decouple these two settings from each other:

  1. Magic number signature ===> mimetype
  2. mimetype ==> program launcher

For the first part, I would use python-magic, a library that has bindings to libmagic. You can have python-magic use a custom magic file like this:

import magic
m = magic.Magic(magic_file='/path/to/magic.file')

Your users can specify a custom magic file mapping magic numbers to mimetypes. The syntax of magic files is documented. Here's an example showing the magic file for the TIFF format:

# Tag Image File Format, from Daniel Quinlan (quinlan@yggdrasil.com)
# The second word of TIFF files is the TIFF version number, 42, which has
# never changed.  The TIFF specification recommends testing for it.
0       string          MM\x00\x2a      TIFF image data, big-endian
!:mime  image/tiff
0       string          II\x2a\x00      TIFF image data, little-endian
!:mime  image/tiff

The second part then is pretty easy, since you only need to specify text data now. You could go with an INI or yaml format, as suggested by others, or you could even have just a simple tab-delimited file like this:

image/tiff         C:\Program Files\imageviewer.exe
application/json   C:\Program Files\notepad.exe
jterrace
  • 64,866
  • 22
  • 157
  • 202
-1

I've used some packages to build configuration files, also yaml. I recommend that you use ConfigParser or ConfigObj.

At last, the best option If you wanna build a human-readable configuration file with comments I strongly recommend use ConfigObj.

Enjoy!

Example of ConfigObj

With this code:

You can use ConfigObj to store them too. Try this one: import configobj

def createConfig(path):
    config = configobj.ConfigObj()
    config.filename = path
    config["Sony"] = {}
    config["Sony"]["product"] = "Sony PS3"
    config["Sony"]["accessories"] = ['controller', 'eye', 'memory stick']
    config["Sony"]["retail price"] = "$400"
    config["Sony"]["binary one"]= bin(173)
    config.write()

You get this file:

[Sony]
product = Sony PS3
accessories = controller, eye, memory stick
retail price = $400
binary one = 0b10101101
carlesh
  • 537
  • 1
  • 4
  • 17