1

I have seen the standard implementation of strlen using pointer as:

int strlen(char * s) {
  char *p = s;
  while (*p!='\0')
    p++;
  return p-s;
}

I get this works, but when I tried to do this using 3 more ways (learning pointer arithmetic right now), I would want to know whats wrong with them?

  1. This is somewhat similar to what the book does. Is this wrong?

    int strlen(char * s) {
      char *p = s;
      while (*p)
        p++;
      return p-s;
    }
    
  2. I though it would be wrong if I pass an empty string but still gives me 0, kinda confusing since p is pre increment: (and now its returning me 5)

    int strlen(char * s) {
      char *p = s;
      while (*++p)
        ;
      return p-s;
    }
    
  3. Figured this out, does the post increment and returns +1 on it.

    int strlen(char * s) {
      char *p = s;
      while (*p++)
        ;
      return p-s;
    }
    
user694733
  • 15,208
  • 2
  • 42
  • 68
user7703770
  • 91
  • 1
  • 8

3 Answers3

1

1) Looks fine to me. I personally prefer the explicit comparison against '\0' so that it's clear you didn't mean to (for example) compare p to the NULL pointer in situations where it's not clear from context.

2) When your program runs, the area of memory known as the stack is uninitialized. Local variables live there. The way you wrote your program puts p in the stack (if you made it const or used malloc, it would almost certainly live elsewhere). What happens when you look at *p is that you then peek at the stack. If the string is length 0, this is the same as char p[1] = {0}. Pre-incrementing looks at the byte immediately after the \0, so you're looking at undefined memory. Here be dragons!

3) I don't think there's a question there :) As you see, it always returns one more than the correct answer.

Addendum: You can also write this using a for-loop, if you prefer this style:

size_t strlen(char * s) {
    char *p = s;
    for (; *p != '\0'; p++) {}
    return p - s;
}

Or (more error-prone-ly)

size_t strlen(char * s) {
    char *p = s;
    for (; *p != '\0'; p++);
    return p - s;
}

Also, strlen can't return a negative number, so you should use an unsigned value. size_t is even better.

lungj
  • 721
  • 5
  • 16
  • this is what my main looks like and now, it's giving me 5. main() { char A[ ]= "" ; printf("%d",strlen(A)); } – user7703770 Apr 24 '17 at 03:13
  • Ah... I see. What's happening is you're incrementing the pointer before checking to see what's there. What you end up checking against is data in the stack. Since that's not defined, you might get any value for the empty string until you hit a \0. – lungj Apr 24 '17 at 03:20
0

Version 1 is fine - while (*p != '\0') is equivalent to while (*p != 0), which is equivalent to while (*p).

In the original code and version 1, the pointer p is advanced if and only if *p is not 0 (IOW, you're not at the end of the string).

Versions 2 and 3 advance p regardless of whether *p is 0 or not. *p++ evaluates to the character p points to, and as a side effect advances p. *++p evaluates to the character following the character p points to, and as a side effect advances p. Therefore, versions 2 and 3 will always advance p past the end of the string, which is why your values are off.

John Bode
  • 119,563
  • 19
  • 122
  • 198
-1

One issue you will run into when you compare the performance of strlen replacement functions is their performance will suffer compared to the actual strlen function for long strings? Why? strlen processes more than one-byte per iteration in searching for the end of string. How can you implement a more efficient replacement?

It's not that difficult. The basic approach is to look at 4-bytes per iteration and adjust the return based on where within those 4-bytes the nul-byte is found. You could do something like the following (using array indexing):

size_t strsz_idx (const char *s) {
    size_t len = 0;
    for(;;) {
        if (s[0] == 0) return len;
        if (s[1] == 0) return len + 1;
        if (s[2] == 0) return len + 2;
        if (s[3] == 0) return len + 3;
        s += 4, len += 4;
    }
}

You can do the exact same thing using pointers and masks:

size_t strsz (const char *s) {
    size_t len = 0;
    for(;;) {
        unsigned x = *(unsigned*)s;
        if((x & 0xff) == 0) return len;
        if((x & 0xff00) == 0) return len + 1;
        if((x & 0xff0000) == 0) return len + 2;
        if((x & 0xff000000) == 0) return len + 3;
        s += 4, len += 4;
    }
}

Either way, you will find a 4-byte comparison each iteration will give you performance equivalent to strlen itself.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • First loop doesn't check the 4 bytes at the time. It uses more complex loop which has more branches and additions than the original. Second loop is worse: It **violates aliasing rules** and results in *undefined behaviour*. Also, this answer doesn't answer the question. Downvoted. – user694733 Apr 24 '17 at 08:18
  • There is nothing worse than a *little* knowledge about the *strict aliasing* rule. What is the rule about `casting to-and-from char*`? There is no violation there and if that is the basis of your ding, you are 100% wrong. Why don't you compile the code with `-Wall -Wextra -pedantic` and find out? I don't mind a downvote when I'm wrong, but I don't expect one when I'm right either. – David C. Rankin Apr 24 '17 at 08:30
  • Casting address of any object type to `char*` is allowed, this is specific exception in standard. However, you cannot do same in reverse and dereference, unless you original and final types match (`unsigned * -> char* -> unsigned*` is legal, `char[] -> unsigned*` is not). N1570 6.5 p7 lists possible types for aliasing, and none of the 6 apply in this case (not even the last). Testing is not sufficient to check if something is legal in C, because of undefined behaviour it might *seem* to work sometimes. `*(unsigned*)s` is the classic example of this. – user694733 Apr 24 '17 at 09:30
  • Your wrong, you can cast anything to `char *` and `char *` to anything. I'm quite happy to go though the standard with you and the comments to the main rules. I have done it many times. The way to address this situation with honor and integrity, is simply to say, I misread the standard and casts involving `char *` are fine. No big deal. The standards involved are Sections 6.5.6 and 6.5.7. (along with the notes and comments to each) – David C. Rankin Apr 24 '17 at 09:45
  • *"you can cast anything to `char *`"* Yes. *"and char * to anything"* But you cannot dereference it, unless `char*` was converted from pointer to 'anything'. **Practical example:** `*(uint64_t*)s` can invoke fault handler on ARM Cortex-M3 due to unaligned access. Alignment restrictions of CPU hardware were one of the reasons why these aliasing rules were introduced in C. `*(unsigned*)s` might seem to work on x86 which has been very relaxed about the alignment, but code is still UB. Which means that optimizing compiler can (for example) replace initialization with `unsigned x = 0;`. – user694733 Apr 24 '17 at 10:02
  • When we start talking about "qualifications" to the casts, sure there will be alignment issues under some circumstances -- NONE RELEVANT HERE. Sec. 6.5 (6) reads ***"The effective type of an object for an access to its stored value is the declared type of the object"*** (what is the type for `'s'` - how it is accessed?, what is the type for `'x'`, how is it accessed?) ***"If a value is stored into an object having no declared type...***" (does not apply). ***"If a value is copied..."*** (does not apply) The *strict aliasing* rule is not violated anywhere. (from the C11 n1570 draft) – David C. Rankin Apr 24 '17 at 13:27
  • I am not denying anything in that paragraph. But please do read the next paragraph too: *"An object shall have its stored value accessed **only by an lvalue expression that has one of the following types**"* This is then followed by list of 6 points: **1.** The 'object' is `char[]`, and lvalue expression on dereference is `unsigned` (`x` is irrelevant on dereference). **These are not compatible types** (see 6.2.7 p1 and 6.7.2). **2.-5.** (not applicable) **6.** Again, lvalue expression is `unsigned`. None these give you free pass on aliasing. – user694733 Apr 24 '17 at 14:15
  • And you miss the last and most important one ***— a character type***. There is NOTHING wrong casting a `char *` as `unsigned x = *(unsigned*)s;` NOTHING. (if there were the SSL library would not exist) There is NOTHING wrong then LOOKING at each byte in `'x'` as an `unsigned` type. What do you think the expression `x & 0xff` is evaluating? What **Pray Tell** SPECIFICALLY, are you alleging IS the CODE that VIOLATES the *strict aliasing* rule. What line?? Look, if you don't want to believe me, and can't read the rule properly, crank every warning on your compiler up and prove it to yourself. – David C. Rankin Apr 24 '17 at 15:04
  • It's this line: `unsigned x = *(unsigned*)s;`. There are 3 operations here: 1) Cast from `char*` to `unsigned*`, 2) dereference from `unsigned*` to `unsigned`, and 3) initialization of `x` of with `unsigned`. We can ignore the last, that is legal. Number 1, the pointer conversion, could give us implementation defined behaviour, but let's assume that we are on a system where that's not an issue. This leaves us with number 2: **the dereference**. Cont... – user694733 Apr 25 '17 at 07:05
  • Now lets look at the last rule again: *"An object shall have its stored value accessed only by an lvalue expression that has one of the following types — a character type."* The object in this case is an array, with type `char[]`. The lvalue expression uses the `unsigned` type for the access. **The *'a character type'* applies to the latter.** It's rule that supports 6.3.2.3 p7. It's a rule that allows you to view any object as a `char[]`. It's **not** a rule that allows you to view `char[]` through any object type. Cont... – user694733 Apr 25 '17 at 07:05
  • *"crank every warning on your compiler up"* Unfortunately, using cast before the dereference disables the compiler warnings. This is why explicit casting is so dangerous. Again, you cannot use testing to determine if something is legal. Existence libraries is not proof either. There are unfortunately many which have this bug on their serialization code. – user694733 Apr 25 '17 at 07:05
  • [Don't take just my word for it.](http://stackoverflow.com/q/7630150/694733) And [another one](http://stackoverflow.com/q/3246228/694733). – user694733 Apr 25 '17 at 07:26