10

Is there a simple way to, in Python, read a file's hexadecimal data into a list, say hex?

So hex would be this:

hex = ['AA','CD','FF','0F']

I don't want to have to read into a string, then split. This is memory intensive for large files.

Joseph
  • 1,003
  • 3
  • 11
  • 25
  • 1
    You have a tuple there, not a list. – jsfan Feb 19 '16 at 22:24
  • 2
    [http://stackoverflow.com/questions/3964245/convert-file-to-hex-string-python](http://stackoverflow.com/questions/3964245/convert-file-to-hex-string-python) This isn't an exact answer but will probably push you in the right direction. – James H Feb 19 '16 at 22:26
  • Do you want the file data as strings or as integers? Your sample output is each byte as a string of two hex characters, but this seems less useful than a list of integers. – e0k Feb 19 '16 at 22:31
  • this may be a useful [Q&A](http://stackoverflow.com/questions/1035340/reading-binary-file-in-python-and-looping-over-each-byte) – Pynchia Feb 19 '16 at 22:54

3 Answers3

13
s = "Hello"
hex_list = ["{:02x}".format(ord(c)) for c in s]

Output

['48', '65', '6c', '6c', '6f']

Just change s to open(filename).read() and you should be good.

with open('/path/to/some/file', 'r') as fp:
    hex_list = ["{:02x}".format(ord(c)) for c in fp.read()]

Or, if you do not want to keep the whole list in memory at once for large files.

hex_list = ("{:02x}".format(ord(c)) for c in fp.read())

and to get the values, keep calling

next(hex_list)

to get all the remaining values from the generator

list(hex_list)
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Just making sure - this won't be slow with large files? – Joseph Feb 19 '16 at 22:29
  • You could make `hex_list` a generator if you are that concerned about it – OneCricketeer Feb 19 '16 at 22:30
  • 2
    Change the `[ ]` to `( )`. It is currently called list-comprehension. Change to parenthesis and you have a generator-comprehension. – OneCricketeer Feb 19 '16 at 22:31
  • 1
    @bytec0de Make sure to put the `next` calls inside a `try..except StopIteration`. – Steinar Lima Feb 19 '16 at 22:43
  • excuse my ignorance: isn't `fp.read()` reading the whole input file in memory anyway? Also: if the file is a pure sequence of binary data (I've just tried with the given numbers), you might get the error `UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte`, since you open the file in `text` mode. I am using python 3 – Pynchia Feb 19 '16 at 23:19
  • @Pynchia True on reading the file into memory, but you'd double the memory for the list as all as the file content. Also, not sure about about the text mode vs binary mode, didn't try – OneCricketeer Feb 20 '16 at 00:38
8

Using Python 3, let's assume the input file contains the sample bytes you show. For example, we can create it like this

>>> inp = bytes((170,12*16+13,255,15)) # i.e. b'\xaa\xcd\xff\x0f'
>>> with open(filename,'wb') as f:
...     f.write(inp)

Now, given we want the hex representation of each byte in the input file, it would be nice to open the file in binary mode, without trying to interpret its contents as characters/strings (or we might trip on the error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte)

>>> with open(filename,'rb') as f:
...     buff = f.read() # it reads the whole file into memory
...
>>> buff
b'\xaa\xcd\xff\x0f'
>>> out_hex = ['{:02X}'.format(b) for b in buff]
>>> out_hex
['AA', 'CD', 'FF', '0F']

If the file is large, we might want to read one character at a time or in chunks. For that purpose I recommend to read this Q&A

Community
  • 1
  • 1
Pynchia
  • 10,996
  • 5
  • 34
  • 43
1

Be aware that for viewing hexadecimal dumps of files, there are utilities available on most operating systems. If all you want to do is hex dump the file, consider one of these programs:

  • od (octal dump, which has a -x or -t x option)
  • hexdump
  • xd utility available under windows
  • Online hex dump tools, such as this one.
aghast
  • 14,785
  • 3
  • 24
  • 56