5

I came across this syntax for reading a BMP file in C++

#include <fstream>
int main() {
    std::ifstream in('filename.bmp', std::ifstream::binary);
    in.seekg(0, in.end);
    size = in.tellg();
    in.seekg(0);
    unsigned char * data = new unsigned char[size];
    in.read((unsigned char *)data, size);

    int width = *(int*)&data[18];
    // omitted remainder for minimal example
}

and I don't understand what the line

int width = *(int*)&data[18];

is actually doing. Why doesn't a simple cast from unsigned char * to int, int width = (int)data[18];, work?

RBreight
  • 53
  • 7
  • It is taking the memory address of `data[18]`, treating it as a pointer to an integer, then dereferencing it. Basically, treating it as a number. This seems like UB though, since `data` is only size 1 – ChrisMM Dec 05 '19 at 00:00
  • @ChrisMM `&data[18]` is size `8` on 64-bit and `4` on 32-bit, so I think this is only undefined behavior if that ends up causing a read past the end of `data`, no? – William Miller Dec 05 '19 at 00:07
  • 2
    *What is *(int*)&data[18] actually doing in this code?* Violating the [Strict Aliasing Rule](https://en.cppreference.com/w/c/language/object#Strict_aliasing), so it could be doing absolutely anything. – user4581301 Dec 05 '19 at 00:10
  • 2
    @WilliamMiller, unless I'm misreading, `data` is allocated as an array of 1 `unsigned char`. I think it should have been `new unsigned char[size]` – ChrisMM Dec 05 '19 at 00:11
  • @ChrisMM I totally missed that, I wonder if it's a typo – William Miller Dec 05 '19 at 00:12
  • 1
    @ChrisMM that was a typo, thanks for pointing it out – RBreight Dec 05 '19 at 00:14
  • Thanks for cleaning that up, @RBreight . Unfortunately it's still Undefined Behaviour. You can view any object as an array of characters, but the reverse is not true. You can only convert an array of characters to another type if certain conditions are met, and in this case they probably aren't. Consider using `int width; memcpy(&width, &data[18], sizeof (width));` to ensure correct alignment and praying that there are no endian issues. – user4581301 Dec 05 '19 at 00:18
  • @user4581301 thanks for that explanation, I'm confused why this works if it's undefined behavior? (it does get the correct value for width as far as i can tell) – RBreight Dec 05 '19 at 00:22
  • Undefined Behaviour can work; it's just not guaranteed to or even be consistent. In this case the writer is making a bunch of assumptions about how the CPU works and what the CPU will let it get away with. Here they're probably only taking a small performance hit on the CPUs that will allow misaligned accesses. This code will, as @WilliamMiller hints at in his answer, fail hilariously on a system with a non-32 bit `int`. That makes `int32_t width; memcpy(&width, &data[18], sizeof (width));` a better idea than my previous suggestion. The size is fixed. – user4581301 Dec 05 '19 at 00:27
  • Undefined behavior can sometimes appear to work. It's just not guaranteed. – Mark Ransom Dec 05 '19 at 00:28
  • @user4581301 that makes more sense, so `int32_t width; memcp(&width, &data[18], sizeof(width));` is implementation agnostic but `*(int*)&data[18]` will fail if `int` isn't 32 bits? – RBreight Dec 05 '19 at 00:32
  • 1
    Yes, but `*(int*)&data[18]` will also fail on CPUs that require a 32 bit number to be aligned to a 32 bit address (Some CPUs will allow mis-aligned data, but access it much more slowly). Assuming that `data` is aligned to whatever size data the CPU prefers (usually 32 or 64 bits) `data[18]` will not be because 18 is not evenly divisible by 4 (32 bits in bytes). It will also fail if the CPU is [big endian](https://en.wikipedia.org/wiki/Endianness) and the byte order is backwards. – user4581301 Dec 05 '19 at 00:47
  • Off topic, but not quite: [https://learn.microsoft.com/en-us/windows/win32/gdi/bitmap-storage](https://learn.microsoft.com/en-us/windows/win32/gdi/bitmap-storage). – zdf Dec 05 '19 at 01:12

1 Answers1

7

Note

As @user4581301 indicated in the comments, this depends on the implementation and will fail in many instances. And as @NathanOliver- Reinstate Monica and @ChrisMM pointed out this is Undefined Behavior and the result is not guaranteed.

According to the bitmap header format, the width of the bitmap in pixels is stored as a signed 32-bit integer beginning at byte offset 18. The syntax

int width = *(int*)&data[18];

reads bytes 19 through 22, inclusive (assuming a 32-bit int) and interprets the result as an integer.

How?

  • &data[18] gets the address of the unsigned char at index 18
  • (int*) casts the address from unsigned char* to int* to avoid loss of precision on 64 bit architectures
  • *(int*) dereferences the address to get the referred int value

So basically, it takes the address of data[18] and reads the bytes at that address as if they were an integer.

Why doesn't a simple cast to `int` work?

sizeof(data[18]) is 1, because unsigned char is one byte (0-255) but sizeof(&data[18]) is 4 if the system is 32-bit and 8 if it is 64-bit, this can be larger (or even smaller for 16-bit systems) but with the exception of 16-bit systems it should be at minimum 4 bytes. Obviously reading more than 4 bytes is not desired in this case, and the cast to (int*) and subsequent dereference to int yields 4 bytes, and indeed the 4 bytes between offsets 18 and 21, inclusive. A simple cast from unsigned char to int will also yield 4 bytes, but only one byte of the information from data. This is illustrated by the following example:

#include <iostream>
#include <bitset>

int main() {
    // Populate 18-21 with a recognizable pattern for demonstration
    std::bitset<8> _bits(std::string("10011010"));
    unsigned long bits = _bits.to_ulong();
    for (int ii = 18; ii < 22; ii ++) {
        data[ii] = static_cast<unsigned char>(bits);
    }

    std::cout << "data[18]                    -> 1 byte  " 
        << std::bitset<32>(data[18]) << std::endl;
    std::cout << "*(unsigned short*)&data[18] -> 2 bytes " 
        << std::bitset<32>(*(unsigned short*)&data[18]) << std::endl;
    std::cout << "*(int*)&data[18]            -> 4 bytes " 
        << std::bitset<32>(*(int*)&data[18]) << std::endl;
}
data[18]                    -> 1 byte  00000000000000000000000010011010
*(unsigned short*)&data[18] -> 2 bytes 00000000000000001001101010011010
*(int*)&data[18]            -> 4 bytes 10011010100110101001101010011010
William Miller
  • 9,839
  • 3
  • 25
  • 46