0
  char arr[]={'a','b','c'};
  int len=strlen(arr);

I know that when the pointer of char meet the address of '0', this function would stop running and return the length between the array's first address and the address of '0'. But when I created one string by that way, I didn't put '0'. So the pointer of char maybe keep moving to find the address of '0'. In this process , the pointer maybe made a error about out-of-bounds access. So why this code didn't make warn to me or why this code didn't make error?

Rachid K.
  • 4,490
  • 3
  • 11
  • 30

3 Answers3

1

strlen() only works correctly for zero-terminated character arrays, and what you have is not one.

What len returns for your program is entirely dependent on what happens to be in memory after the address arr + 3.

If there's a zero there, then you'll get 3. If there's other data before a zero, then you'll get another number. If you're unlucky and there's no zero (in your process's memory space), your program will crash with an out-of-bounds read.

For instance, the program

#include <stdio.h>
#include <string.h>

int main(void) {
  char blarr[] = {'d', 'e', 'f'};
  char arr[] = {'a', 'b', 'c'};
  int len = strlen(arr);
  printf("%d\n", len);
  return 0;
}

may print 6, depending on how the compiler allocates arr and blarr on stack.

Your compiler doesn't warn about anything, because your program is technically correct – you're passing in a char* to strlen, that's fine – but it's not smart enough to detect that that char* isn't a zero-terminated string.

AKX
  • 152,115
  • 15
  • 115
  • 172
0

So the pointer of char maybe keep moving to find the address of '0'.In this process , the pointer maybe made a error about out-of-bounds access.

Yes, that's exactly what happened.

So why this code didn't make warn to me or why this code didn't make error?

Because the declaration

char arr[] = {'a','b','c'};

is perfectly valid. You haven't given the compiler any indication that you intend to use arr as a string.

A somewhat more interesting case is if you were to write

char arr[3] = "abc";

Due to a historical quirk, this is perfectly legal C, although it creates exactly the same array arr and will have exactly the same problem if you pass it to to strlen. Here, though, I believe some compilers will warn, and it would certainly be an appropriate warning, since the feature is debatable, and rarely deliberately used.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • 1
    "Here, though, some compilers will warn" Which compilers would that be? I just tested latest gcc, clang, icx with `-Wall -Wextra`, no warnings. Same in MSVC. This quirk is particularly nasty just because most compilers will _not_ warn. – Lundin Sep 15 '22 at 12:35
  • @Lundin Good question. I had already tried with gcc and clang with -Wextra and was surprised to get no warning. I *thought* I'd heard of this warning existing, but maybe it was just wishful thinking. – Steve Summit Sep 15 '22 at 12:40
  • @SteveSummit it is an error in C++. – n. m. could be an AI Sep 15 '22 at 12:50
  • @SteveSummit (*cough*) Maybe you need to use a [C/C++ compiler](https://meta.stackoverflow.com/a/420340/584518) ;) j/k But well, these kind of super-subtle differences between C and C++ would be one of the many reasons to try to keep the languages separated on SO. – Lundin Sep 15 '22 at 13:01
  • 2
    I just found a C compiler that does warn. Codewarrior for MCUs in C mode gives "resulting string is not zero terminated" as a warning. This compiler is known to be picky in general. – Lundin Sep 15 '22 at 13:29
0

Often times, it is about managing expectations.

Let's start with a small thought experiment (or time travel back to the early days of computing), where there are no programming languages - just machine code. There, you would (with CPU specific instructions) write something like this to represent a string:

arr: db 'a','b','c'
strlen:                         ; RDI (pointer to string) -> RAX (length of string)
                                ; RAX length counter and return value
                                ; CL used for null character test
        xor RAX, RAX            ; set RAX to 0
strlen_loop:
        mov cl, [rdi]           ; load CL with the byte pointed to by argument
        test cl,cl
        jz strlen_loop_done
        inc rdi                 ; look at next byte in argument
        inc rax                 ; increment the length counter
        jmp strlen_loop
strlen_loop_done:
        ret                     ; rax contains a zero terminated strings length

Compared to that, writing the same function in C is much simpler.

  • We do not have to care about register allotment (which register does what).
  • We do not rely on the instruction set of a specific CPU
  • We do not have to look up the "calling conventions" or ABI for the target system (argument passing conventions etc)
size_t strlen(const char* s) {
  size_t l = 0;
  while (*s) {
    l++;
    s++;
  }
  return l;
}

The convention, that "strings" are just pointers to chars (bytes) with the null value terminator is admittedly quite arbitrary but "comes" with the C programming language. It is just a convention. The compiler itself knows nothing about it (oh well it does know to add a terminating null on string literals). But when calling strlen() it cannot distinguish the string case from the just a byte array case. Why? because there is no specific string type.

As such, it is just about as clever as the assembler code version I gave above. It relies on the "c-string-convention". The assembler does not check, nor does the C compiler, because - let's be honest, C's main accomplishments are the bullet items I gave above.

So if you manage your expectations, about the language C, think of it as: A slightly abstracted version of a glorified assembly language.

If you are annoyed about the c-string convention (after all, strlen is O(n) in time complexity), you can still come up with your own string type, maybe so:

typedef struct String_tag {
  size_t length;
  char data[];
} String_t;

And write yourself helpers (to create a string on the heap) and macros (to create a string on the stack with alloca or something). And write your own string feature library around that type.

If you are just getting started with C, instead of tackling something bigger, I think this would be a good exercise for learning the language.

BitTickler
  • 10,905
  • 5
  • 32
  • 53