char a[512] vs char b[512 + 1]

Question

I see lot of code, using the following notation

char a[512 + 1];  
a[512] = '\0';

Is it not inefficient, memory utilization wise? Assuming you are using 32 bit machine. And [512 + 1] would actally mean [512 + 4].
It might not be a big deal for server applications, but for embedded system programming it shall matter.

Basically, nowadays this is almost never a cause for a concern. — Cray, Apr 11 '12 at 10:33
see: http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 — thumbmunkeys, Apr 11 '12 at 10:47

score 3 · Answer 1 · answered Apr 11 '12 at 10:31

3

char bla[512];
bla[sizeof bla - 1] = '\0';

is better in my opinion.

answered Apr 11 '12 at 10:31

ouah

142,963
15
272
331

What happens if length of his string is `512`? – Alok Save Apr 11 '12 at 10:40
1

@Als the same thing as if the length is `> 512` – ouah Apr 11 '12 at 11:22

Alok Save · Answer 2 · 2012-04-11T10:33:47.970

1

There is no extra memory being used here. It is 513 bytes as char is guaranteed by the standard to be 1 byte on all implementations. Padding bytes are added in case of structures not in case of arrays. In case of arrays you get what you asked for nothing more.

Note that the syntax serves an additional purpose, It is more readable.
It clearly tells that the string is supposed to be 512char and needs extra char is needed for the terminating \0. The onus is more towards writing more readable code in applications.

edited Apr 11 '12 at 10:33

answered Apr 11 '12 at 10:28

Alok Save

202,538
53
430
533

3

you're missing the point: Vivek's worry is that because of alignment requirements, there'll need to be padding bytes between the end of `a` and the subsequent object, and this is indeed the case on most platforms; the question could have been better worded (in particular the padding bytes are *not* part of the array and accessing them is UB), but it's a (somewhat) valid concern – Christoph Apr 11 '12 at 10:53
`Padding bytes are added in case of structures not in case of arrays` - except that looots of CPUs have alignment restrictions on the stack. Intel CPUs need at least 8bytes alignment for performance reasons and usually 16... Which means that any compiler for these CPUs will make sure the stack stays aligned. – Voo Apr 11 '12 at 13:13

score 1 · Answer 3 · answered Apr 11 '12 at 11:18

Making the total object size a multiple of the fundamental alignment of the architectures is indeed the most memory-efficient solution.

Additionally choosing a power of 2 as object size might avoid fragmentation of allocated objects, but that depends on the implementation of the libc allocator.

On mainstream architectures, your particular example is rarely an issue, but there's a related one: structure padding, with the additional caveat that the compiler isn't free to re-order members at will.

score 1 · Answer 4 · answered Apr 11 '12 at 11:19

First of all, the question is far too wide to answer in detail, as it depends on what CPU that is used.

but for embedded system programming it shall matter.

Every semi-modern embedded MCU/MPU is likely to support misaligned accessing, and/or have support for smaller load instructions than 32 bit. Smaller 8/16 bit modern MCUs will most likely not have any alignment issues at all.

If you do come across a CPU that cannot read misaligned data, then what can you do... then there is no way you can allocate an odd number of bytes.

Pavan Manjunath · Answer 5 · 2012-04-11T10:34:41.183

0

Its just extra safety to accomodate the NULL character.

And [512 + 1] would actally mean [512 + 4].

512+1 would be 513 unless you worry about memory alignment issues ?

IF still you aren't happy make it

 char arr[511+1];
 arr[sizeof(arr)-1]=0;

edited Apr 11 '12 at 10:34

answered Apr 11 '12 at 10:29

Pavan Manjunath

27,404
12
99
125

That's exactly what he worries about, so this doesn't answer the question. – Lundin Apr 11 '12 at 11:09
1

@Lundin If you want safety and as well as savings of 3 bytes, then its difficult. You need to cut down 1 byte on the string length as I've done here. – Pavan Manjunath Apr 11 '12 at 11:13

score 0 · Answer 6 · edited Oct 16 '22 at 16:45

I'm not sure but it should depend on the alignment restriction of your CPU. Some RISC CPUs like MIPS does not access individual bytes. only four byte words are accessed using load instruction. In that case every byte will take up a word. I don't think that's the case in real world situations. I suspect compiler put consecutive 'char's in words. then use 'masking' to retrieve individual bytes.

score 0 · Answer 7 · answered Apr 11 '12 at 12:03

It depends on the scale. If you have 10 of these, who cares? If you have 1 million? Then you already have 512MB of data, so I don't think the wasted 3MB will be your real problem.

The 1 million objects have to be allocated, thus the allocation control structures will need much more than 3MB. There's more sense in optimizing the allocation, e.g. allocating large arrays instead of single objects. Or storing the strings with byte-exact storage in a large char array, so a string of length 127 will only use 128 bytes, instead of the full 513, wasting much much more. Or even storing each string in a compressed format.

If you need the full 512 byte, you could as well store the strings without the terminating '\0' and only access them by wrapper functions that take care for this.

Note that all these solutions mean extra work that may be error-prone and need extra care, like the last one. So be sure you will need the scalability to large numbers and that memory will be a bottleneck, else you may do premature optimization with all the impacts on readability and maintainability, without ever needing it.

char a[512] vs char b[512 + 1]

7 Answers7