101

I'm trying to read a BMP file in Python. I know the first two bytes indicate the BMP firm. The next 4 bytes are the file size. When I execute:

fin = open("hi.bmp", "rb")
firm = fin.read(2)  
file_size = int(fin.read(4))  

I get:

ValueError: invalid literal for int() with base 10: 'F#\x13'

What I want to do is reading those four bytes as an integer, but it seems Python is reading them as characters and returning a string, which cannot be converted to an integer. How can I do this correctly?

Lucas Walter
  • 942
  • 3
  • 10
  • 23
Manuel Araoz
  • 15,962
  • 24
  • 71
  • 95
  • 2
    If your goal is to *use* the bitmap instead of spending time writing your own BMP library (not that that doesn't sound like fun...) you can use PIL http://www.pythonware.com/products/pil/ which you may already have installed. Try: import Image – Jared Updike Jul 22 '09 at 07:24
  • 9
    Thanks Jared, but I wanted to read the bmp manually only to have fun! :) – Manuel Araoz Jul 22 '09 at 07:33

7 Answers7

140

The read method returns a sequence of bytes as a string. To convert from a string byte-sequence to binary data, use the built-in struct module: http://docs.python.org/library/struct.html.

import struct

print(struct.unpack('i', fin.read(4)))

Note that unpack always returns a tuple, so struct.unpack('i', fin.read(4))[0] gives the integer value that you are after.

You should probably use the format string '<i' (< is a modifier that indicates little-endian byte-order and standard size and alignment - the default is to use the platform's byte ordering, size and alignment). According to the BMP format spec, the bytes should be written in Intel/little-endian byte order.

Rufflewind
  • 8,545
  • 2
  • 35
  • 55
codeape
  • 97,830
  • 24
  • 159
  • 188
  • 22
    Instead of writing `i = struct.unpack(...)[0]` I often write `i, = struct.unpack(...)` – Otto Allmendinger Jul 22 '09 at 10:32
  • @Otto Is there any reason you prefer one way over the other? Is there any logical difference? – Caltor Oct 16 '12 at 22:45
  • 4
    I find it very surprising that there isn't a built-in function to read integers (or Shorts etc) from a file in Python. I'm no Java expert but I believe it has native functions such as readUnsignedShort() to do this. – Caltor Oct 16 '12 at 22:47
  • @codeape Could you define what the [0] is doing please or at least what type of language element it is. It isn't immediately apparent and it is almost impossible to search for in the Python documentation. – Caltor Oct 16 '12 at 23:37
  • For lists and tuples, obj[N] means: get the Nth element of obj. See http://docs.python.org/tutorial/introduction.html#lists – codeape Oct 17 '12 at 10:27
  • @Caltor `struct.unpack` is a builtin function (try `type(struct.unpack)`). – gerrit Jul 01 '15 at 11:29
  • @gerrit you still need to use 2 functions rather than 1 though. Why does everything have to be read from file as a string and then converted to a number rather than just reading a number straight from a file? – Caltor Jul 01 '15 at 11:38
  • @Caltor It's not read as a string, at least not in Python3, where `struct.unpack` expects a `bytes` object. Reading from a buffer or stream has the advantage that I can pass bytes from *other* streams, for example, I can use `gzip.GzipFile` to read data without having to unpack the entire file into memory. – gerrit Jul 01 '15 at 11:53
65

An alternative method which does not make use of 'struct.unpack()' would be to use NumPy:

import numpy as np

f = open("file.bin", "r")
a = np.fromfile(f, dtype=np.uint32)

'dtype' represents the datatype and can be int#, uint#, float#, complex# or a user defined type. See numpy.fromfile.

Personally prefer using NumPy to work with array/matrix data as it is a lot faster than using Python lists.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Emanuel Ey
  • 2,724
  • 5
  • 30
  • 38
  • 16
    File opening can be skiped : `a = np.fromfile('file.bin', dtype=np.uint32)` – Mathieu Schopfer Jan 23 '15 at 12:32
  • In my case this didn't directly work. Depending on your encoding you may try more esoteric dtypes such as : `np.fromfile( file, dtype='>i2')` , > or < determine big or little endian. Depending on the number of byte you can go with i2 or i4 – Adrien Mau Jan 20 '22 at 12:10
  • 1
    To me, the idea of using such a huge and complicated package as NumPy for so low-level and elementary operations is very much of an overkill. – Nikolaj Š. Jun 15 '22 at 15:00
34

As of Python 3.2+, you can also accomplish this using the from_bytes native int method:

file_size = int.from_bytes(fin.read(2), byteorder='big')

Note that this function requires you to specify whether the number is encoded in big- or little-endian format, so you will have to determine the endian-ness to make sure it works correctly.

CrepeGoat
  • 2,315
  • 20
  • 24
  • in python3, `int`'s are dynamically sized (it's the same implementation as Python2's `long`; this is also sometimes referred to as "big int's"), which I believe is the motivation for adding this `int` method in the first place. https://docs.python.org/3/library/stdtypes.html#:~:text=Integers%20have%20unlimited%20precision. – CrepeGoat Jan 19 '21 at 14:56
  • So why 2? How do you know? – Gilad Jan 19 '21 at 18:03
  • 1
    ohhh the number 2's just from the OP; it's not a magic number or anything. this could be generalized by replacing 2 with some positive integer variable `n` and it'd work all the same – CrepeGoat Jan 19 '21 at 19:21
  • I wonder why you used big-endian byteorder, when for most people correct option would be "little". Not to mention that BMP specification requires little-endianness. – Nikolaj Š. Jun 15 '22 at 15:05
  • @Klas thanks for clarifying that BMP uses little endian, I didn't know that! I'm also not sure what "most people" use, so tbh I just picked one arbitrarily. the point though is that you can choose between `'little'` and `'big'` depending on what you need. – CrepeGoat Jun 16 '22 at 15:23
  • 1
    Well, most of modern architectures and OSes are little-endian, from what I know (admittedly, not too much). So when I tried your approach, I got weird result. And it might confuse people who aren't ready to investigate and cost you some reputation points =) – Nikolaj Š. Jun 17 '22 at 13:33
6

Except struct you can also use array module

import array
values = array.array('l') # array of long integers
values.read(fin, 1) # read 1 integer
file_size  = values[0]
Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136
  • Good point. But this solution is not as flexible as that of the struct module, since all elements read through values.read() must be long integers (it is not convenient to read a long integer, a byte, and then a long integer, with the array module). – Eric O. Lebigot Jul 22 '09 at 09:40
  • I agree. `array` is an efficient way to read a binary file but not very flexible when we have to deal with structure, as you correctly mentioned. – Nick Dandoulakis Jul 22 '09 at 10:42
  • 1
    array.read is deprecated in favor of array.fromfile since 1.51 –  Aug 04 '11 at 17:04
4

As you are reading the binary file, you need to unpack it into a integer, so use struct module for that

import struct
fin = open("hi.bmp", "rb")
firm = fin.read(2)  
file_size, = struct.unpack("i",fin.read(4))
Anurag Uniyal
  • 85,954
  • 40
  • 175
  • 219
1

When you read from a binary file, a data type called bytes is used. This is a bit like list or tuple, except it can only store integers from 0 to 255.

Try:

file_size = fin.read(4)
file_size0 = file_size[0]
file_size1 = file_size[1]
file_size2 = file_size[2]
file_size3 = file_size[3]

Or:

file_size = list(fin.read(4))

Instead of:

file_size = int(fin.read(4))
Programmer S
  • 429
  • 7
  • 21
0

Here's a late solution but I though it might help.

fin = open("hi.bmp", "rb")
firm = fin.read(2)
file_size = 0
for _ in range(4):  
    (file_size << 8) += ord(fin.read(1))