0

What form is correct in allocating string in C?

char *sample;

sample = malloc ( length * sizeof(char) );

or

sample = malloc ( length * sizeof(char*) );

Why does char* take 4 bytes when char takes 1 byte?

Graham Borland
  • 60,055
  • 21
  • 138
  • 179
fex
  • 3,488
  • 5
  • 30
  • 46

9 Answers9

6

Assuming the goal is to store a string of length characters, the correct allocation is:

sample = malloc(length + 1);

Notes:

  1. Don't use sizeof (char), since it's always 1 it doesn't add any value.
  2. Remember the terminator, I assumed (based on name) that length is the length in visible characters of the string, i.e. the return of strlen() will be length.
  3. I know you didn't, but it's worth pointing out that there should be no cast of the return value from malloc(), either.

The reason char * is larger is that it's a pointer type, and pointers are almost always larger than a single character. On many systems (such as yours, it seems) they are 32 bit, while characters are just 8 bits. The larger size is needed since the pointer needs to be able to represent any address in the machine's memory. On 64-bit computers, pointers are often 64 bits, i.e. 8 characters.

Community
  • 1
  • 1
unwind
  • 391,730
  • 64
  • 469
  • 606
  • 2
    Though `sizeof(char)` doesn't add anything, it makes the intent explicit about what's going on. I guess it's a matter of taste. – P.P Dec 06 '12 at 11:21
  • @KingsIndian: it's a matter of how you read the `malloc` call (and I agree that in turn is a matter of taste). unwind and I would read `malloc(length + 1)` as "allocate `length+1` bytes" or "allocate `length+1` chars", and know that the two mean exactly the same thing in C. Some might read it as "allocate `length+1` of something, but of what? The author doesn't say, maybe they don't know!". To those readers, throwing an explicit `sizeof(char)` in there might help convince them that I meant what I said. – Steve Jessop Dec 06 '12 at 11:31
  • @SteveJessop It may confuse when using `length+sizeof(int)` for allocating `length` number of integers or other data type. Why not calculate the same way length*4 bytes (if sizeof(int)=4) and same way for other data types. This will obviously lead to problems when `sizeof(int)` changes. Requires to have a complete understanding of subtleties involved that `sizeof(char)` is always 1, `sizeof(int)` may vary etc. So `length+sizeof(char)` reads `"length chars"` and `"length ints"`, `"length floats"` etc. IMO, this is very explicit and don't have to think about chars as some kind of a special case. – P.P Dec 06 '12 at 11:54
  • @KingIndian: I don't understand who you're saying it would confuse. You? Some hypothetical newbie who will quickly get over their ignorance? Or actual colleagues? Like I say, I don't think it's wrong to put the size in there, just a matter of taste and redundant for real C programmers that I have encountered. And I might write the code differently in a tutorial than in production, because it's for a different audience (newbies), but by the *end* of the tutorial I would reveal the production way(s). – Steve Jessop Dec 06 '12 at 12:15
  • And btw I'm not thinking of chars as a special case of allocation, I'm thinking of chars as the fundamental unit of memory in C. So `malloc(1)` *means*, "allocate 1 char". `malloc(sizeof(int))` *means*, "allocate a number of chars equal to the size of `int` in chars". It's not my choice that they're a special case, but they are. You can choose to ignore this, and write your allocations as though the fundamental unit might be something other than `char`. I don't see any huge advantage, just the minor ones in my answer. It's actively harmful if it prevents people learning what that unit is in C. – Steve Jessop Dec 06 '12 at 12:22
  • @SteveJessop chars is not a special case, that's the point. To me `malloc(length*sizeof(char)` is easily readable than `malloc(length)` (couple of people agreed with me). If you think in the opposite way, well that's alright. Since there's no difference between the two and the whole point is based on personal choice, I don't think we can conclude anything. – P.P Dec 06 '12 at 13:13
  • @KingsIndian: I think there's something important in here somewhere about the meaning of source code to humans (rather than to compilers). But since both of us can easily read either of the two possible intended meanings-to-humans out of either of the two ways of writing it, it's difficult to identify what the difference in intended meaning-to-humans is :-) – Steve Jessop Dec 06 '12 at 13:20
2

Why does char* take 4 bytes when char takes 1 byte?

Because you are on a 32-bit systems, meaning that pointers take four bytes; char* is a pointer.

char always takes exactly one byte, so you do not need to multiply by sizeof(char):

sample = malloc (length);

I am assuming that length is already padded for null termination.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
1
sample = malloc(length);

is the right one

char* is a pointer, a pointer uses 4 bytes (say on a 32-bit platform)

char is a char, a char uses 1 byte

bph
  • 10,728
  • 15
  • 60
  • 135
  • `length * sizeof(char) +1` is somewhat bizarre. I can understand `(length + 1) * sizeof(char)`, and `length + 1` is fine too, but why add 1 after multiplication? – Dietrich Epp Dec 06 '12 at 11:12
1
sample = malloc ( length * sizeof(char) );

First is the correct one if you want to allocate memory for length number of characters.

char* is of type pointer which happens to be 4 bytes on your platform. So sizeof(char*) returns 4.

But sizeof(char) is always 1 and smae is guaranteed by the C standard.

P.P
  • 117,907
  • 20
  • 175
  • 238
1

In the given cases you are doing two different things:

In the first case : sample = malloc ( length * sizeof(char) );

You are allocating length multiplied by the size of type char which is 1 byte

While in the second case : sample = malloc ( length * sizeof(char*) );

You are allocating length multiplied by the size of pointer to char which is 4 byte on your machine.

Consider that while case 1 remains immutable, on the second case the size is variable.

Lucian Enache
  • 2,510
  • 5
  • 34
  • 59
1

In your case, you want to alloc an array of length characters. You will store in sample a pointer to an array of length times the size of what you point to. The sizeof(char*) is the size of a pointer to char. Not the size of a char.

A good practice is

sample = malloc(length * sizeof(*sample));

Using that, you will reserve length time the size of what you want to point to. This gives you the ability to change the data type anytime, simply declaring sample to be another kind of data.

int *sample;

sample = malloc(length * sizeof(*sample)); // length * 4


char *sample;

sample = malloc(length * sizeof(*sample)); // length * 1
tomahh
  • 13,441
  • 3
  • 49
  • 70
1

Provided the length already accounts for the nul terminator, I would write either:

sample = malloc(length);

or:

sample = malloc(length * sizeof(*sample));

sizeof(char*) is the size of the pointer, and it is completely irrelevant to the the size that the allocated buffer needs to be. So definitely don't use that.

My first snippet is IMO good enough for string-manipulation code. C programmers know that memory and string lengths in C are both measured in multiples of sizeof(char). There's no real need to put a conversion factor in there that everybody knows is always 1.

My second snippet is the One True Way to write allocations in general. So if you want all your allocations to look consistent, then string allocations should use it too. I can think of two possible reasons to make all your allocations look consistent (both fairly weak IMO, but not actually wrong):

  • some people will find it easier to read them that way, only one visual pattern to recognise.
  • you might want to use the code in future as the basis for code that handles wide strings, and a consistent form would remind you to get the allocation right when the length is no longer measured in bytes but in wide chars. Using sizeof(*sample) as the consistent form means you don't need to change that line of code at all, assuming that you update the type of sample at the same time as the units in which length is measured.

Other options include:

sample = calloc(length, 1);
sample = calloc(length, sizeof(char));
sample = calloc(length, sizeof(*sample));

They're probably fairly pointless here, but as well as the trifling secondary effect of zeroing the memory, calloc has an interesting difference from malloc that it explicitly separates the number and size of objects that you're planning to use, whereas malloc just wants the total size.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
0

For any type T, the usual form is

T *p = malloc(N * sizeof *p);

or

T *p;
...
p = malloc(N * sizeof *p);

where N is the number of elements of type T you wish to allocate. The expression *p has type T, so sizeof *p is equivalent to sizeof (T).

Note that sizeof is an operator like & or *, not a library function; parentheses are only necessary if the operand is a type name like int or char *.

John Bode
  • 119,563
  • 19
  • 122
  • 198
0

Please visit this Linkhttps://www.codesdope.com/c-dynamic-memory/for understand how it allocat the memory dynamically at run time. It might be helpful to understand the concept of malloc and how it allocate the amount of memory to the variable.

In your example;

char *sample;
sample = malloc ( length * sizeof(char) );

here, you are declare a pointer to character for sample without declaring how much memory it required. In the next line, length * sizeof(char) bytes memory is assigned for the address of sample and (char*) is to typecast the pointer returned by the malloc to character.

Ankit Lad
  • 369
  • 3
  • 5