5

It's been a long day and I'm a bit stumped.

I'm reading a binary file that contains lots of wide-char strings and I want to dump these out as Python unicode strings. (To unpack the non-string data I'm using the struct module, but I don't how to do the same with the strings.)

For example, reading the word "Series":

myfile = open("test.lei", "rb")
myfile.seek(44)
data = myfile.read(12)

# data is now 'S\x00e\x00r\x00i\x00e\x00s\x00'

How can I encode that raw wide-char data as a Python string?

Edit: I'm using Python 2.6

Mikesname
  • 8,781
  • 2
  • 44
  • 57
  • `file` isn't supposed to be used to open files; `open` is. `codecs.open` is great if this is really a text file but one in a somewhat weird encoding. – Mike Graham Apr 30 '10 at 17:43
  • Mike G - quite right, I've corrected the example. Actually I normally use 'open', but something was screwy with my ipython shell today and it gave me an obscure error. I'd probably overwritten it with something else. – Mikesname Apr 30 '10 at 23:34

4 Answers4

8
>>> data = 'S\x00e\x00r\x00i\x00e\x00s\x00'
>>> data.decode('utf-16')
u'Series'
interjay
  • 107,303
  • 21
  • 270
  • 254
3

I also recommend to use rstrip with '\x00' after decode - to remove all '\x00' trailing characters, unless, of course, they are not needed.

>>> data = 'S\x00o\x00m\x00e\x00\x20\x00D\x00a\x00t\x00a\x00\x00\x00\x00\x00'
>>> print '"%s"' % data.decode('utf-16').rstrip('\x00')
>>> "Some Data"

Without rstrip('\x00') the result will be with trailing spaces:

>>> "Some Data  "
Delimitry
  • 2,987
  • 4
  • 30
  • 39
2

If the string in question is known not to have any characters beyond FF, another possibility that generates a string rather than a unicode object, by eliding the zero-bytes:

>>> 'S\x00e\x00r\x00i\x00e\x00s\x00'[::2]
'Series'
kismet
  • 21
  • 1
0

Hmm, why do you say "open" is preferrable to "file"? I see in the reference (python 2.5):

3.9 File Objects File objects are implemented using C's stdio package and can be created with the built-in constructor file() described in section 2.1, ``Built-in Functions.''3.6 ----- Footnote (3.6) file() is new in Python 2.2. The older built-in open() is an alias for file().

Nas Banov
  • 28,347
  • 6
  • 48
  • 67