0

I have a txt file with a stream of HEX data, I would like to convert it in binary fomart in order to save space on the disk.

this is my simple script just to test the decoding and the binary storage

hexstr = "12ab"

of = open('outputfile.bin','wb')

for i in hexstr:
    #this is how I convert an ASCII char to 7 bit representation 
    x = '{0:07b}'.format(ord(i))
    of.write(x)

of.close()

I exect that outputfile.bin has a size of 28 bit, instead the results is 28 byte. I guess the problem is that x is a string and not a bit sequence.

How should I do?

Thanks in advance

user2944566
  • 325
  • 1
  • 4
  • 7
  • Yep, python's `filobject.write` is expecting a string, so it probably also writes it as a string. – aIKid Nov 01 '13 at 13:51
  • I think probably binascii module is what you are looking for... – gtgaxiola Nov 01 '13 at 13:51
  • Have you seen [this](http://stackoverflow.com/q/2452861/149530) question? Note that storing 7 bits for each ASCII character will only save you 1 octet every 8 characters -- possibly less depending on your file system. – Michael Foukarakis Nov 01 '13 at 14:13
  • What is _"a stream of HEX data"_ ? What does mean _"a txt file with a stream of HEX data"_ ? – eyquem Nov 01 '13 at 14:45

2 Answers2

0

Is this what you want? "12ab" should be written as \x01\x02\x0a\x0b, right?

import struct

hexstr = "12ab"

of = open('outputfile.bin','w')

for i in hexstr:
    of.write(struct.pack('B', int(i, 16)))

of.close()
satoru
  • 31,822
  • 31
  • 91
  • 141
0

First of all, you will not get a file size that is not a multiple of 8 bits on any popular platform.

Second, you really have to brush up an what "binary" actually means. You confuse two different concepts: representing a number in the binary number system and writing out data in a "non human readable" form.

Actually, you are confusing two even more fundamental concepts: data and the representation of data. "12ab" is a representation of the four bytes in memory, as is "\x31\x32\x61\x62".

Your problem is that x contains 28 bytes of data that can either be represented as "0110001011001011000011100010" or as "\x30\x31\x31\x30\x30...\x30\x30\x31\x30".

Maybe this will help you:

>>> hexstr = "12ab"
>>> len(hexstr)
4
>>> ['"%s": %x' % (c, ord(c)) for c in hexstr]
['"1": 31', '"2": 32', '"a": 61', '"b": 62']

>>> i = 42
>>> hex(i)
'0x2a'
>>> x = '{0:07b}'.format(i)
>>> x
'0101010'
>>> [hex(ord(c)) for c in x]
['0x30', '0x31', '0x30', '0x31', '0x30', '0x31', '0x30']
>>> hex(ord('0')), hex(ord('1'))
('0x30', '0x31')

>>> import binascii
>>> [hex(ord(c)) for c in binascii.unhexlify(hexstr)]
['0x12', '0xab']

That said, thhe binascii module has a method you can use:

import binascii

data = binascii.unhexlify(hexstr)
with open('outputfile.bin', 'wb') as f:
    f.write(data)

This will encode your data in 8bit instead of 7bit, but usually it is not worth the effort to use 7bit for compression reasons anyway.