0

I'm attempting to unify an ARM project's (specifically, the i.MX27 CPU running Linux 2.6.33.3, being compiled with GCC 4.3.2) approach to its SQLite interactions. As part of that, I've created a structure with a union that gets used to hold values to be bound to prepared statements.

#define SQLITE_DATA_CHARACTER_STRING_MAX 1024

typedef struct
{
    int data_type;
    union
    {
        int integer;
        double floating_point;
        unsigned char character_string[SQLITE_DATA_CHARACTER_STRING_MAX];
    };
}sqlite_data;

Originally, this was int, float, char. I wanted to use long long, double, and char. However, that seems to cause a problem. As typed above, the following code produces predictable output:

int data_fields = 15;
int data_fields_index = 0;
sqlite_data data[data_fields];

LogMsg(LOG_INFO, "%s: Assigning", __FUNCTION__);

for(data_fields_index = 0; data_fields_index < data_fields; data_fields_index++)
{
    data[data_fields_index].data_type = (100 + data_fields_index);
    data[data_fields_index].integer = (1000 + data_fields_index);
    LogMsg(LOG_INFO, "%s: data[%d] - %d; type - %d", __FUNCTION__, data_fields_index, data[data_fields_index].integer, data[data_fields_index].data_type);
}

The output of which is this:

 Assigning
 data[0] - 1000; type - 100
 data[1] - 1001; type - 101
 data[2] - 1002; type - 102
 data[3] - 1003; type - 103
 data[4] - 1004; type - 104
 data[5] - 1005; type - 105
 data[6] - 1006; type - 106
 data[7] - 1007; type - 107
 data[8] - 1008; type - 108
 data[9] - 1009; type - 109
 data[10] - 1010; type - 110
 data[11] - 1011; type - 111
 data[12] - 1012; type - 112
 data[13] - 1013; type - 113
 data[14] - 1014; type - 114

However, if I make only one change (giving integer the type long long) it all falls apart. So, the following change:

typedef struct
{
    int data_type;
    union
    {
        long long integer;
        double floating_point;
        unsigned char character_string[SQLITE_DATA_CHARACTER_STRING_MAX];
    };
}sqlite_data;

Produces this ouput:

Assigning
data[0] - 1000; type - 0
data[1] - 1001; type - 0
data[2] - 1002; type - 0
data[3] - 1003; type - 0
data[4] - 1004; type - 0
data[5] - 1005; type - 0
data[6] - 1006; type - 0
data[7] - 1007; type - 0
data[8] - 1008; type - 0
data[9] - 1009; type - 0
data[10] - 1010; type - 0
data[11] - 1011; type - 0
data[12] - 1012; type - 0
data[13] - 1013; type - 0
data[14] - 1014; type - 0

I've tried deunionizing them, using #pragma pack(6), and putting that array on the heap, all with identical results: int works, long long doesn't.

What's going on here?

pdm
  • 1,027
  • 1
  • 9
  • 24
  • Side note, unless you are targeting C11, [unnamed unions are not portable](http://stackoverflow.com/questions/3228104/anonymous-union-within-struct-not-in-c99). Got bit last week when porting a custom protocol to Windows. – Joe Dec 30 '13 at 19:16
  • You can also print out the address of the struct to see if the step changes with the declaration change. That would let you see the padding effect directly. – mpez0 Dec 30 '13 at 19:35
  • Unnamed unions aren't standards-compliant, but they are extremely portable. I'm not aware of any mainstream compiler that doesn't support them as an extension. – Sneftel Dec 30 '13 at 21:58

2 Answers2

6

You're not telling printf() to expect a long long. Use %lld in your format string instead of %d.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • The `long long` is printing out fine. It's `int type` that's printing out as zero. – pdm Dec 30 '13 at 19:11
  • 3
    @musasabi But maybe you are messing up your stack with the wrong specifier (it causes undefined behavior). So use `%lld` for printing `long long`. –  Dec 30 '13 at 19:19
  • 3
    I added some visualization of what is likely happening which I hope helps you understand (this is still undefined behavior so we can't be 100% sure that what I show is happening). Sneftel and H2CO3 are correct. – Joe Dec 30 '13 at 19:52
4

Sneftel is absolutely correct. The problem is that you are not specifying long long which results in undefined behavior. To help you visualize picture the following:

"%s: data[%d] - %d; type - %d"

%s - [✓] __FUNCTION__ is a valid string.
%d - [✓] data_fields_index is an int
%d - [x] data[data_fields_index].integer is a long long, this will only read 
          the first 4 bytes of an 8 byte integer.
%d - [x] this is likely reading the last 4 bytes of 
          data[data_fields_index].integer which will be 0 for smaller numbers 
          on a little endian architecture.

So as a long long 1000 will be stored in memory in little endian as:

[0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00]
  ^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^^
      first %d          second %d

Changing the format to "%s: data[%d] - %lld; type - %d" will fix this.

Joe
  • 56,979
  • 9
  • 128
  • 135
  • 1
    Exactly. The endianness of the platform is important here, as is the underlying ABI. The takeaway is "undefined behavior can look like anything -- it's not localized". – Sneftel Dec 30 '13 at 21:56
  • I want to thank you both for being exactly correct _and_ so detailed. =) Upvoted, though I can only select one answer. <3 to all. =D – pdm Jan 06 '14 at 18:51