13

(sizeof) char always returns 1 in 32 bit GCC compiler.

But since the basic block size in 32 bit compiler is 4, How does char occupy a single byte when the basic size is 4 bytes???

Considering the following :

struct st 
{
int a;
char c;
};

sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)

I can never understand why sizeof(char) returns as 1 when it is allotted a block of size 4.

Can someone pls explain this???

I would be very thankful for any replies explaining it!!!

EDIT : The typo of 'bits' has been changed to 'bytes'. I ask Sorry to the person who made the first edit. I rollbacked the EDIT since I did not notice the change U made. Thanks to all those who made it a point that It must be changed especially @Mike Burton for downvoting the question and to @jalf who seemed to jump to conclusions over my understanding of concepts!!

sbi
  • 219,715
  • 46
  • 258
  • 445
Mor Eru
  • 1,129
  • 3
  • 18
  • 35
  • Effectively a duplicate of [Why isn't sizeof for a struct equal to the sum of sizeof of each member?](http://stackoverflow.com/questions/119123/why-isnt-sizeof-for-a-struct-equal-to-the-sum-of-sizeof-of-each-member) – James McNellis Aug 10 '10 at 17:03
  • 3
    In writing portable C programs I've learned not to assume much of anything; even the statement `the basic block size in 32 bit compiler is 4 bytes` is an assumption that may not be true, since data sizes and address sizes are not required to be the same. The "basic block size" depends on a system's memory access architecture, and does not have to correspond to the data register size. Just an aside, as this doesn't affect your basic question about C struct sizes. – Stephen P Aug 10 '10 at 17:18
  • -1 for rolling back to the incorrect usage of "bit" where you mean "byte" – Mike Burton Aug 11 '10 at 17:52
  • 2
    I guess you mean bytes, not bits? – Frank Osterfeld Aug 11 '10 at 19:16
  • 1
    @James, I don't think this is a duplicate. The other question assumes no knowledge of padding or alignment. This question is about why sizeof apparently returns the padded value in some cases, and not others. – Aaron H. Aug 12 '10 at 16:22
  • 3
    @Shyam, In your question, it should be: "But since the basic block size in 32 bit compiler is 4 bytes, How does char occupy a single byte when the basic size is 4 bytes???", and, "sizeof(st) returns as 8 as agreed with the default block size of 4 bytes (since 2 blocks are allotted)" There are eight bits in a byte, 4 bytes in 32 bits. – Aaron H. Aug 12 '10 at 16:25
  • @Mike: if the OP is confused about the difference between bits and bytes, that should be explained in an answer. Don't just change the meaning of his question. – jalf Aug 12 '10 at 21:57
  • 1
    @jalf - you seem to misunderstand the nature of this website. What precisely do you think the edit and community edit functionality is there for? It's not there for good looks. If someone with poor English rolled back a language cleanup edit, that would also be wrong. The meaning of this question is obvious to most of the people who visit it. It should have been obvious to the OP after the first edit. No comment required. – Mike Burton Aug 13 '10 at 04:52
  • 1
    @Mike: when you edit, look in the right sidebar. It says to "respect the original author". If the original author does not himself understand the difference between a bit and a byte, and it is crucial to his question, then it is something the answer has to address, and it cannot be swept under the carpet as "just a language error". My comment isn't because you edited it (which is understandable -- we can't always what the author intended to write), but your childish -1 because he rolled back your edit. He felt the question no longer represented what he intended to ask. – jalf Aug 13 '10 at 10:37
  • His question is *wrong*, sure, it doesn't strictly speaking make sense because he mixes up bytes and bits, but clearly this is what he wants to know, so we'll just have to clear it up in the answers. The *answer* to the question is obvious to you and me, but it's also obvious to me that the OP simply doesn't understand the difference between bits, bytes and words. He didn't type "bit" instead of "byte" as a typo. He did it because he doesn't know the difference. Anyway, downvoting the OP for exercising his right to control his own question is childish and offensive. – jalf Aug 13 '10 at 10:39
  • First up, I wasn't the person who did the edit. I voted down the rolling back of an edit because it is a bad practice and harmful to the site, not because my feelings were hurt. Second, we'll have to agree to disagree on whether editing the question is proper. If the point of SO isn't to simply answer the question for one person, but for all people who search for it, then editing the question to accurately reflect the question being answered is, in my opinion, the best way to deal with problems like the OP. You'll notice I left a comment so that the OP could learn from the mistake. – Mike Burton Aug 13 '10 at 14:31
  • @jaif... I am the asker of this question and I very well know the difference between bits and bytes and it is upto U to come to assumptions about my knowledge of bits and bytes... I rollbacked because I did not notice the edit to the question and also I am new to this site and dont know the features fully... Pls dont come to any conclusions regarding my understanding of concepts... – Mor Eru Aug 17 '10 at 15:22
  • @Mike Burton.... First of all I did not notice the edit made by some user. I just received a message that it was edited and I rollbacked since I could not see the edit visibly.... "changing 'bits' to 'bytes'." The person who made the edit could have atleast left a small note at the end.... I also dont think y U r so interested in downvoting a question because of a small typo.... U seem to have funny ideas about enforcing correctness in everything.... Most of them noticed the typo as evident in the initial set of comments and none of them downvoted!!! – Mor Eru Aug 17 '10 at 15:30
  • @Mike.... And I have learnt onething from Ur responses... "Never make typos or some correctness enforcers will come and downvote Ur question".. I ll try to follow it!!! – Mor Eru Aug 17 '10 at 15:31
  • @Shyam: It wasn't a mean spirited down vote, it was meant to get your attention, which the comments and even the edit itself didn't seem to be doing. You can view a highlighted edit history by clicking on the edited note just below the question. It's funny to me how bad the reaction to a simple downvote with comment is on this question...everywhere else on the site people are begging for comments to accompany downvotes, here you're mounting an attack on them! Careful what you wish for, as they say. – Mike Burton Aug 17 '10 at 16:11
  • @Mike...This is the first time I even know that a Edit History option exists... As i already told I rollbacked because I did not know of this, Since I have changed it to the rollbacked version.. Plus I added some new comments.. I think that this would be more than suffice. – Mor Eru Aug 17 '10 at 16:20
  • @Shyam: Absolutely. I've already revoked the downvote. – Mike Burton Aug 17 '10 at 18:16
  • possible duplicate of [Why are C character literals ints instead of chars?](http://stackoverflow.com/questions/433895/why-are-c-character-literals-ints-instead-of-chars) – dmckee --- ex-moderator kitten Sep 08 '10 at 23:07

8 Answers8

27

sizeof(char) is always 1. Always. The 'block size' you're talking about is just the native word size of the machine - usually the size that will result in most efficient operation. Your computer can still address each byte individually - that's what the sizeof operator is telling you about. When you do sizeof(int), it returns 4 to tell you that an int is 4 bytes on your machine. Likewise, your structure is 8 bytes long. There is no information from sizeof about how many bits there are in a byte.

The reason your structure is 8 bytes long rather than 5 (as you might expect), is that the compiler is adding padding to the structure in order to keep everything nicely aligned to that native word length, again for greater efficiency. Most compilers give you the option to pack a structure, either with a #pragma directive or some other compiler extension, in which case you can force your structure to take minimum size, regardless of your machine's word length.

char is size 1, since that's the smallest access size your computer can handle - for most machines an 8-bit value. The sizeof operator gives you the size of all other quantities in units of how many char objects would be the same size as whatever you asked about. The padding (see link below) is added by the compiler to your data structure for performance reasons, so it is larger in practice than you might think from just looking at the structure definition.

There is a wikipedia article called Data structure alignment which has a good explanation and examples.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 4
    I read the question as: "I understand about block sizes, but why does sizeof(char) return less than block size, while sizeof(struct) returns a padded value." – Aaron H. Aug 10 '10 at 16:52
  • 1
    @Aaron, I think my second paragraph covers that, right? Do you think I need to add some clarification? – Carl Norum Aug 10 '10 at 16:57
  • 1
    He asked why does `sizeof(char)` not include padding. The answer is that there is no padding to `char`, it can be allocated on 1 byte, we can store it on space allocated with `malloc(1)`. `st` cannot be stored on space allocated with `malloc(5)` because when `st` struct is being copied whole 8 bytes are being copied. – adf88 Aug 10 '10 at 17:04
  • @Aaron, That is the exact question!!! @Carl, I think the 2nd paragraph would need some more clarification with respect to Aaron's question. – Mor Eru Aug 11 '10 at 12:50
  • @Shyam, I added another paragraph to extend the explanation of data structure padding. – Carl Norum Aug 11 '10 at 16:13
  • @Carl, that is clearer. Is it safe to say that the sizeof(struct ...) returns the padded size because the struct itself "encapsulates" the padding; the compiler(?) won't return what the smaller allocations in the struct are without some introspection? – Aaron H. Aug 12 '10 at 16:17
9

It is structure alignment with padding. c uses 1 byte, 3 bytes are non used. More here

Andrey
  • 59,039
  • 12
  • 119
  • 163
6

Sample code demonstrating structure alignment:

struct st 
{
int a;
char c;
};

struct stb
{
int a;
char c;
char d;
char e;
char f;
};

struct stc
{
int a;
char c;
char d;
char e;
char f;
char g;
};

std::cout<<sizeof(st) << std::endl; //8
std::cout<<sizeof(stb)  << std::endl; //8
std::cout<<sizeof(stc)  << std::endl; //12

The size of the struct is bigger than the sum of its individual components, since it was set to be divisible by 4 bytes by the 32 bit compiler. These results may be different on different compilers, especially if they are on a 64 bit compiler.

Brian
  • 25,523
  • 18
  • 82
  • 173
3

First of all, sizeof returns a number of bytes, not bits. sizeof(char) == 1 tells you that a char is eight bits (one byte) long. All of the fundamental data types in C are at least one byte long.

Your structure returns a size of 8. This is a sum of three things: the size of the int, the size of the char (which we know is 1), and the size of any extra padding that the compiler added to the structure. Since many implementations use a 4-byte int, this would imply that your compiler is adding 3 bytes of padding to your structure. Most likely this is added after the char in order to make the size of the structure a multiple of 4 (a 32-bit CPU access data most efficiently in 32-bit chunks, and 32 bits is four bytes).

Edit: Just because the block size is four bytes doesn't mean that a data type can't be smaller than four bytes. When the CPU loads a one-byte char into a 32-bit register, the value will be sign-extended automatically (by the hardware) to make it fill the register. The CPU is smart enough to handle data in N-byte increments (where N is a power of 2), as long as it isn't larger than the register. When storing the data on disk or in memory, there is no reason to store every char as four bytes. The char in your structure happened to look like it was four bytes long because of the padding added after it. If you changed your structure to have two char variables instead of one, you should see that the size of the structure is the same (you added an extra byte of data, and the compiler added one fewer byte of padding).

bta
  • 43,959
  • 6
  • 69
  • 99
  • To be pedantic, sizeof(char) == 1 tells you that a char is one char in size. The C standard doesn't require a system to operate in bytes. – Darron Aug 11 '10 at 16:19
  • @Darron- That is incorrect. Section 6.5.3.4 of the C spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf) specifies "the `sizeof` operator yields the size (in bytes) of its operand". – bta Aug 11 '10 at 16:59
  • paragraph 3 of that section "When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1". Section 3.6 defines byte as "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment". This is not the byte people normally think of. – Darron Aug 11 '10 at 17:37
  • 1
    @Darron: but it is a byte nevertheless. It is just not required to be 8 bits wide. – jalf Aug 12 '10 at 22:10
2

All object sizes in C and C++ are defined in terms of bytes, not bits. A byte is the smallest addressable unit of memory on the computer. A bit is a single binary digit, a 0 or a 1.

On most computers, a byte is 8 bits (so a byte can store values from 0 to 256), although computers exist with other byte sizes.

A memory address identifies a byte, even on 32-bit machines. Addresses N and N+1 point to two subsequent bytes.

An int, which is typically 32 bits covers 4 bytes, meaning that 4 different memory addresses exist that each point to part of the int.

In a 32-bit machine, all the 32 actually means is that the CPU is designed to work efficiently with 32-bit values, and that an address is 32 bits long. It doesn't mean that memory can only be addressed in blocks of 32 bits.

The CPU can still address individual bytes, which is useful when dealing with chars, for example.

As for your example:

struct st 
{
int a;
char c;
};

sizeof(st) returns 8 not because all structs have a size divisible by 4, but because of alignment. For the CPU to efficiently read an integer, its must be located on an address that is divisible by the size of the integer (4 bytes). So an int can be placed on address 8, 12 or 16, but not on address 11.

A char only requires its address to be divisible by the size of a char (1), so it can be placed on any address.

So in theory, the compiler could have given your struct a size of 5 bytes... Except that this wouldn't work if you created an array of st objects.

In an array, each object is placed immediately after the previous one, with no padding. So if the first object in the array is placed at an address divisible by 4, then the next object would be placed at a 5 bytes higher address, which would not be divisible by 4, and so the second struct in the array would not be properly aligned.

To solve this, the compiler inserts padding inside the struct, so its size becomes a multiple of its alignment requirement.

Not because it is impossible to create objects that don't have a size that is a multiple of 4, but because one of the members of your st struct requires 4-byte alignment, and so every time the compiler places an int in memory, it has to make sure it is placed at an address that is divisible by 4.

If you create a struct of two chars, it won't get a size of 4. It will usually get a size of 2, because when it contains only chars, the object can be placed at any address, and so alignment is not an issue.

jalf
  • 243,077
  • 51
  • 345
  • 550
1

Sizeof returns the value in bytes. You were talking about bits. 32 bit architectures are word aligned and byte referenced. It is irrelevant how the architecture stores a char, but to compiler, you must reference chars 1 byte at a time, even if they use up less than 1 byte.

This is why sizeof(char) is 1.

ints are 32 bit, hence sizeof(int)= 4, doubles are 64 bit, hence sizeof(double) = 8, etc.

Razor Storm
  • 12,167
  • 20
  • 88
  • 148
  • 2
    An `int` is *usually* 32 bits on 32-bit architectures, but it's not defined to *always* be 32 bits. – bta Aug 10 '10 at 16:52
  • 1
    That's what I meant, sorry. The storage used for these stuff are not specified, just that short <= int <= long <= longlong, float <= double <= long double / quad – Razor Storm Aug 10 '10 at 17:08
1

Because of optimisation padding is added so size of an object is 1, 2 or n*4 bytes (or something like that, talking about x86). That's why there is added padding to 5-byte object and to 1-byte not. Single char doesn't have to be padded, it can be allocated on 1 byte, we can store it on space allocated with malloc(1). st cannot be stored on space allocated with malloc(5) because when st struct is being copied whole 8 bytes are being copied.

adf88
  • 4,277
  • 1
  • 23
  • 21
0

It works the same way as using half a piece of paper. You use one part for a char and the other part for something else. The compiler will hide this from you since loading and storing a char into a 32bit processor register depends on the processor.

Some processors have instructions to load and store only parts of the 32bit others have to use binary operations to extract the value of a char.

Addressing a char works as it is AFAIR by definition the smallest addressable memory. On a 32bit system pointers to two different ints will be at least 4 address points apart, char addresses will be only 1 apart.

josefx
  • 15,506
  • 6
  • 38
  • 63