10

I will be coaching an ACM Team next month (go figure), and the time has come to talk about strings in C. Besides a discussion on the standard lib, strcpy, strcmp, etc., I would like to give them some hints (something like str[0] is equivalent to *str, and things like that).

Do you know of any lists (like cheat sheets) or your own experience in the matter?

I'm already aware of the books for the ACM competition (which are good, see particularly this), but I'm after tricks of the trade.

Thank you.

Edit: Thank you very much everybody. I will accept the most voted answer, and have duly upvoted others which I think are relevant. I expect to do a summary here (like I did here, asap). I have enough material now and I'm certain this has improved the session on strings immensely. Once again, thanks.

Community
  • 1
  • 1
Dervin Thunk
  • 19,515
  • 28
  • 127
  • 217
  • Fine. Title has been changed. – Dervin Thunk Aug 17 '09 at 22:58
  • *str is not equivalent to str[0]. So, start with that. – jkeys Aug 17 '09 at 23:12
  • 6
    @Hooked: How not? `a[i]` is equivalent to `*(a+i)`, meaning `a[0]` is equivalent to `*(a+0)`, which is in turn equivalent to `*a`. – Chuck Aug 17 '09 at 23:18
  • a[0] returns a direction reference. *str dereferences a pointer (that is why it's called indirection). Two different things. – jkeys Aug 17 '09 at 23:56
  • 3
    My Google skills cannot find any use of the phrase "direction reference" in relation to C. The equivalence of arrays and pointers in C that I illustrated earlier is pretty well-known — it's even on the Wikipedia page — so I really can't figure out what you're trying to say. – Chuck Aug 18 '09 at 00:05
  • 3
    @Hooked. Your comment is factually wrong. Would you mind deleting it, so it doesn't confuse other people? – Dervin Thunk Aug 18 '09 at 01:02
  • Err, direct reference. That is factually correct. a[0] always returns a reference, that is why it can be a lvalue. I understand that pointers and arrays are closely related (especially for cstrings), but *str returns a reference to whereever it is pointing, and a[0] always returns a reference to the first element. That is factually correct. – jkeys Aug 18 '09 at 23:40
  • 1
    Is the first element of `a` ever something different from the element to which `a` is pointing? Otherwise, it sounds like you just said they're the same. – Chuck Aug 19 '09 at 07:00
  • @Hooked: a[0] always returns an lvalue (don't know quite what you mean by reference here) where a is pointing. *a always returns an lvalue where a is pointing. Read Chuck's excellent comment showing why they're the same thing. – David Thornley Aug 19 '09 at 20:22

16 Answers16

26

It's obvious but I think it's important to know that strings are nothing more than an array of bytes, delimited by a zero byte. C strings aren't all that user-friendly as you probably know.

  • Writing a zero byte somewhere in the string will truncate it.
  • Going out of bounds generally ends bad.
  • Never, ever use strcpy, strcmp, strcat, etc.., instead use their safe variants: strncmp, strncat, strndup,...
  • Avoid strncpy. strncpy will not always zero delimit your string! If the source string doesn't fit in the destination buffer it truncates the string but it won't write a nul byte at the end of the buffer. Also, even if the source buffer is a lot smaller than the destination, strncpy will still overwrite the whole buffer with zeroes. I personally use strlcpy.
  • Don't use printf(string), instead use printf("%s", string). Try thinking of the consequences if the user puts a %d in the string.
  • You can't compare strings with
    if( s1 == s2 )
                doStuff(s1);
    You have to compare every character in the string. Use strcmp or better strncmp.
    if( strncmp( s1, s2, BUFFER_SIZE ) == 0 )
             doStuff(s1);
Dour High Arch
  • 21,513
  • 29
  • 75
  • 90
Kasper
  • 2,451
  • 2
  • 17
  • 19
  • 4
    If you are really using printf and not a wrapper macro which does additional things, puts/fputs are the functions you are looking for. –  Aug 18 '09 at 00:02
  • Is strlcpy() standard C? It's probably important to know that for a competition. If not, be prepared to write it. Also, strcpy etc. is safe if you can prove that the destination is sufficiently long. – David Thornley Aug 19 '09 at 20:24
  • I'm personally fond of using strncpy, followed by writing a NUL to the end of the destination array. That way I know it wasn't over-written, and I know it's terminated. Since strlcpy is not (to my knowledge) yet a standard, I don't like to rely on it when I'm bouncing between environments. – Michael Kohne Aug 19 '09 at 20:35
  • 1
    @David Thorley: strlcpy is indeed not standard and that idiot Drepper refuses to put it in glibc. But it turns out really great, because the strlcpy I wrote is faster than strcpy. I don't like strncpy because it overwrites the whole array, instead of just what size I give. – Kasper Aug 19 '09 at 21:26
  • Note that you cannot safely use `strncat()` unless you can safely use `memmove()` or `memcpy()` instead. In particular, `strncat(target, source, sizeof(target))` is incorrect unless you know that `*target == '\0'`. Using `strncat()` is usually a mistake. – Jonathan Leffler Jun 11 '16 at 04:29
5

Abusing strlen() will dramatically worsen the performance.

for( int i = 0; i < strlen( string ); i++ ) {
    processChar( string[i] );
}

will have at least O(n2) time complexity whereas

int length = strlen( string );
for( int i = 0; i < length; i++ ) {
    processChar( string[i] );
}

will have at least O(n) time complexity. This is not so obvious for people who haven't taken time to think of it.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • But wouldn't the compiler optimize that and only really access the `strlen()` function once? – galois Dec 12 '15 at 08:42
  • 1
    @jaska Maybe it will maybe it will not - depends on a lot of factors. The Standard certainly doesn't require it optimize it away and neither it prohibits such optimization. – sharptooth Dec 14 '15 at 09:39
3

The following functions can be used to implement a non-mutating strtok:

strcspn(string, delimiters)
strspn(string, delimiters)

The first one finds the first character in the set of delimiters you pass in. The second one finds the first character not in the set of delimiters you pass in.

I prefer these to strpbrk as they return the length of the string if they can't match.

MSN
  • 53,214
  • 7
  • 75
  • 105
3

str[0] is equivalent to 0[str], or more generally str[i] is i[str] and i[str] is *(str + i).

NB

this is not specific to strings but it works also for C arrays

dfa
  • 114,442
  • 31
  • 189
  • 228
3

The strn* variants in stdlib do not necessarily null terminate the destination string.

As an example: from MSDN's documentation on strncpy:

The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count.

MSN
  • 53,214
  • 7
  • 75
  • 105
  • 2
    Actually, it's not the complete strn* family, only strncpy. strncat got its own problems too though. Still, writing the null wouldn't necessarily make your program safer. What if you wanted to transfer the contents of the file /etc/passwd-archive/public-data, but your data gets truncated by strncpy to /etc/passwd? – Kasper Aug 17 '09 at 23:12
  • Yes, the general problem of using strings safely in an unmanaged dynamic memory environment is itself a Master's thesis in and of itself. Assuming you still want to do it :) – MSN Aug 17 '09 at 23:21
2

strtok is not thread safe, since it uses a mutable private buffer to store data between calls; you cannot interleave or annidate strtok calls also.

A more useful alternative is strtok_r, use it whenever you can.

dfa
  • 114,442
  • 31
  • 189
  • 228
2

confuse strlen() with sizeof() when using a string:

char *p = "hello!!";
strlen(p) != sizeof(p)

sizeof(p) yield, at compile time, the size of the pointer (4 or 8 bytes) whereas strlen(p) counts, at runtime, the lenght of the null terminated char array (7 in this example).

dfa
  • 114,442
  • 31
  • 189
  • 228
2

kmm has already a good list. Here are the things I had problems with when I started to code C.

  1. String literals have an own memory section and are always accessible. Hence they can for example be a return value of function.

  2. Memory management of strings, in particular with a high level library (not libc). Who is responsible to free the string if it is returned by function or passed to a function?

  3. When should "const char *" and when "char *" be used. And what does it tell me if a function returns a "const char *".

All these questions are not too difficult to learn, but hard to figure out if you don't get taught them.

quinmars
  • 11,175
  • 8
  • 32
  • 41
1

I have found that the char buff[0] technique has been incredibly useful. Consider:

struct foo {
   int x;
   char * payload;
};

vs

struct foo {
   int x;
   char payload[0];
};

see https://stackoverflow.com/questions/295027

See the link for implications and variations

Community
  • 1
  • 1
ezpz
  • 11,767
  • 6
  • 38
  • 39
1

I'd point out the performance pitfalls of over-reliance on the built-in string functions.

char* triple(char* source)
{
   int n=strlen(source);
   char* dest=malloc(n*3+1);
   strcpy(dest,src);
   strcat(dest,src);
   strcat(dest,src);
   return dest;
 }
AShelly
  • 34,686
  • 15
  • 91
  • 152
1

I would discuss when and when not to use strcpy and strncpy and what can go wrong:

char *strncpy(char* destination, const char* source, size_t n);

char *strcpy(char* destination, const char* source );

I would also mention return values of the ansi C stdlib string functions. For example ask "does this if statement pass or fail?"

if (stricmp("StrInG 1", "string 1")==0)
{
    .
    .
    .
}
bn.
  • 7,739
  • 7
  • 39
  • 54
  • `stricmp()` is not an ANSI C standard function, it's an extension provided by MS VC++ and perhaps some other implementations. In GCC, the function is called `strcasecmp()` (probably the one time I'll actually side with Microsoft on something), but is still not standard. – Chris Lutz Aug 18 '09 at 04:53
1

perhaps you could illustrate the value of sentinel '\0' with following example

char* a = "hello \0 world"; char b[100]; strcpy(b,a); printf(b);

I once had my fingers burnt when in my zeal I used strcpy() to copy binary data. It worked most of the time but failed mysteriously sometimes. Mystery was revealed when I realized that binary input sometimes contained a zero byte and strcpy() would terminate there.

Rohin
  • 173
  • 1
  • 1
  • 6
0

You could mention indexed addressing.

An elements address is the base address + index * sizeof element

Erix
  • 7,059
  • 2
  • 35
  • 61
  • You should clarify: in C arrays and pointers, `* sizeof(element)` is done for you by the compiler, and the generated assembly will reflect the `sizeof(element)` factor. But what does this have to do with strings? `sizeof(char) == 1` – Chris Lutz Aug 17 '09 at 23:39
  • Just because the compiler does it for you and the size of a char happens to be one doesn't mean the implementation isn't important. – Erix Aug 17 '09 at 23:52
  • `sizeof(char)` doesn't _happen_ to be 1 - it's specified in the standard. – Chris Lutz Aug 18 '09 at 00:56
  • You're right, nobody should ever know this information because characters are one byte. – Erix Aug 18 '09 at 03:03
  • I'm not saying it doesn't matter, I'm saying it has nothing to do with strings. – Chris Lutz Aug 18 '09 at 04:44
0

A common error is:

char *p;
snprintf(p, 3, "%d", 42);

it works until you use up to sizeof(p) bytes.. then funny things happens (welcome to the jungle).

Explaination

with char *p you are allocating space for holding a pointer (sizeof(void*) bytes) on the stack. The right thing here is to allocate a buffer or just to specify the size of the pointer at compile time:

char buf[12];
char *p = buf;
snprintf(p, sizeof(buf), "%d", 42); 
dfa
  • 114,442
  • 31
  • 189
  • 228
  • 1
    Your first example should never work, even if you use less than `sizeof(*p)` bytes, because `snprintf` won't copy a string into the pointer, but the memory that the pointer _points to_. A `char *p` is not the same as a `char p[]`. In your second example, `*p` is superfluous, as `buf` could be passed to `snprintf` directly to make the code clearer. – Chris Lutz Aug 17 '09 at 23:36
  • the former example works, try it with a compiler :) In the latter I know that `*p` is superflous but it serve to the purpose of show "how to allocate memory" – dfa Aug 17 '09 at 23:39
  • 2
    It works because `*p`, upon declaration, holds a random value, and therefore points to a random segment of memory that may happen to be writable, thus giving you the illusion that it works when writing small amounts of text to it, and thus why it breaks when you try to write too much. – Chris Lutz Aug 17 '09 at 23:45
  • Also, just tried it on my compiler. First example: `Bus error`. (GCC 4.0, OS X Leopard) – Chris Lutz Aug 17 '09 at 23:51
  • you can try with older compiler on an older UNIX, problably macosx randomizes segments to minimize bad things like buffer overflows, etc – dfa Aug 17 '09 at 23:56
  • Things like "works on your machine" or "you can make it work on an older UNIX" does NOT mean it's correct. Your code is undefined behavior according to the C standard, which means it might work, it might crash, or it might erase your hard drive. That's what undefined behavior is. – Adam Rosenfield Aug 18 '09 at 01:20
0

Pointers and arrays, while having the similar syntax, are not at all the same. Given:

char a[100]; char *p = a;

For the array, a, there is no pointer stored anywhere. sizeof(a) != sizeof(p), for the array it is the size of the block of memory, for the pointer it is the size of the pointer. This become important if you use something like: sizeof(a)/sizeof(a[0]). Also, you can't ++a, and you can make the pointer a 'const' pointer to 'const' chars, but the array can only be 'const' chars, in which case you'd be init it first. etc etc etc

0

If possible, use strlcpy (instead of strncpy) and strlcat.

Even better, to make life a bit safer, you can use a macro such as:

#define strlcpy_sz(dst, src) (strlcpy(dst, src, sizeof(dst)))
Sint
  • 1,580
  • 3
  • 21
  • 38