3

I'm trying to create a program to get specific EXIF information of a jpeg without using PIL and such. I'm reading the file in binary but the output is slightly confusing...

file = open("/Users/Niko/Desktop/IMG.JPG", "rb")
print(file.read(16))

Which outputs:

b'\xff\xd8\xff\xe1/\xfeExif\x00\x00MM\x00*\x00\x00\x00\x08\x00\x0b\x01\x0f\x00\x02\x00\x00\x00\x06\x00\x00'

What I'm confused about is what the "\","/", and "*" mean... I know that the first few bytes that signify its a jpeg is 0xFF 0xD8, so I take it the \s are 0s? Can anyone help me understand this?

Apologies for any beginners mistakes, new to coding in general and kind of just jumped in to creating this program.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Why not use the Python Imaging Library (PIL), or it's modern repackaging Pillow? That library supports reading EXIF out of the box. – Martijn Pieters Nov 06 '13 at 16:25
  • 1
    Python byte representations print any character outside of the printable ASCII range as a `\xHH` hexadecimal escape code. `\xff` is a byte with value 255 (hex FF), while `M` is a byte with the same value as the ASCII codepoint for the capital letter `M`, 77 (hex 4D). `/` is 47 (hex 2F), `*` is 42, (hex 2A). – Martijn Pieters Nov 06 '13 at 16:27
  • 0xff - hex notation. \xff - hex notation within a string. – Karoly Horvath Nov 06 '13 at 16:27
  • 1
    as @MartijnPieters said, you're printing escaped characters. You probably want to `print(file.read(16).encode('hex'))` instead – loopbackbee Nov 06 '13 at 16:28
  • Mostly because I'm using this as more of a way to learn about EXIF and binary, so that I can create a web based version –  Nov 06 '13 at 16:29
  • possible duplicate of [In Python, how do I read the exif data for an image?](http://stackoverflow.com/questions/4764932/in-python-how-do-i-read-the-exif-data-for-an-image) – Jaime Soriano Nov 06 '13 at 16:30
  • 1
    @goncalopp: That won't work on Python 3; the `bytes` type has *no* `.encode()` method. – Martijn Pieters Nov 06 '13 at 16:37

1 Answers1

1

Python presents you with a representation of the byte string that you can copy and paste into a Python interpreter again.

In order to make it readable and survive pasting into something that doesn't handle raw bytes, anything that isn't printable is escaped using a Python byte escape code, \xHH, representing the hexademical value for a byte.

Anything that is printable, is represented as the ASCII character directly. A hex byte 0x41 is the letter A (capital) in ASCII, and printed as such:

>>> b'\x41'
b'A'

Thus, * is hex 2A, / is hex 2F:

>>> hex(ord(b'*'))
'0x2a'
>>> hex(ord(b'/'))
'0x2f'

You could use binascii.hexlify() to generate an all-hexadecimal representation of your bytes:

>>> from binascii import hexlify
>>> hexlify(b'\xff\xd8\xff\xe1/\xfeExif\x00\x00MM\x00*\x00\x00\x00\x08\x00\x0b\x01\x0f\x00\x02\x00\x00\x00\x06\x00\x00')
b'ffd8ffe12ffe4578696600004d4d002a00000008000b010f0002000000060000'

That said, you would be better off installing Pillow (the modernized fork of the Python Image Library) and have it handle JPEG images, including extracting EXIF information, for you.

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343