1

Was just trying to understand the below code asked in a recent interview.

#include <stdio.h>
#include <string.h>

int main() {
    char *ptr = "Linux";
    char a[] = "Solaris";
    strcat(a, ptr);
    printf("%s\n", ptr);
    printf("%s\n", a);
    return 0;
}

Execution trace:

gcc -Wall -g prog.c
gdb a.out

(gdb) p ptr
$15 = 0x400624 "Linux"
(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"
**(gdb) p a
$21 = "SolarisL"**
**(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"**
(gdb)
$23 = 0x7fffffffe7f0 "SolarisLinux"
**(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>
(gdb)**

I have a few questions:

  1. Does strcat remove the string literal from the original location, as accessing ptr gives a segmentation fault?

  2. Why does p a in gdb doesn't give the proper output where as p a+0 shows "SolarisLinux"?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Rndp13
  • 1,094
  • 1
  • 21
  • 35
  • 5
    Change `char a[] = "Solaris";` to e.g. `char a[50] = "Solaris";`, otherwise there won't be room for the appended string and you will get UB. – Paul R Aug 08 '18 at 11:57
  • Thanks for your comments, basically aware of how to make the prog work, but was just looking for why is it behaving this way. – Rndp13 Aug 08 '18 at 11:58
  • 2
    It's undefined behaviour, so it isn't behaving in a specific way. This time it trashed the value of `ptr` with "inux". Next time it might cause your pets to explode. It's best to avoid writing code that doesn't work and focus on writing code that does work. – Chris Turner Aug 08 '18 at 12:20
  • Please only ask one question per post. Why a program behaves in a certain way is a totally different question than why does GDB show some output. – Gerhardh Aug 08 '18 at 13:06

3 Answers3

2

If I understand your question correct you are aware that the program has undefined behavior due to a not being able to hold the string "Solaris" concatenated with "Linux".

So the answer your looking for is not "This is undefined behavior" but rather:

why is it behaving this way

When dealing with Undefined behavior we can't give a general explanation of what's going on. It may do different things on different systems or different things for different compilers (or compiler versions) and so on.

Therefore it's often said that it makes no sense to try to explain what is going on in a program with undefined behavior. And well - that's correct.

However - sometimes you can find an explanation for your specific system - just remember that it is specific for your system and in no way universal.

So I changed your code to add some debug print:

#include<stdio.h>
#include<string.h>

int main()
{
    char *ptr = "Linux";
    char a[] = "Solaris";
    printf("   a = %p\n", (void*)a);
    printf("&ptr = %p\n", (void*)&ptr);
    printf(" ptr = %p\n", (void*)ptr);

    // Print the data that ptr holds
    unsigned char* p = (unsigned char*)&ptr;

    printf("\nBefore strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n");

    strcat(a,ptr);

    printf("\nAfter strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n\n");

    printf("%s\n", a);

    printf("%s\n", ptr);

    return 0;
}

On my system this generates:

   a = 0x7ffff3ce5050
&ptr = 0x7ffff3ce5058
 ptr = 0x400820

Before strcat
  a:
53 6f 6c 61 72 69 73 00
  ptr:
20 08 40 00 00 00 00 00

After strcat
  a:
53 6f 6c 61 72 69 73 4c
  ptr:
69 6e 75 78 00 00 00 00

SolarisLinux
Segmentation fault

Here the output is with some comments added:

   a = 0x7ffff3ce5050   // The address where the array a istored
&ptr = 0x7ffff3ce5058   // The address where ptr is stored. Notice 8 higher than a
 ptr = 0x400820         // The value of ptr

Before strcat
  a:
53 6f 6c 61 72 69 73 00 // Hex dump of a gives Solaris\0
  ptr:
20 08 40 00 00 00 00 00 // Hex dump of ptr is the value 0x0000000000400820 (little endian system)

// Here strcat is executed

After strcat
  a:
53 6f 6c 61 72 69 73 4c // Hex dump of a gives SolarisL
  ptr:
69 6e 75 78 00 00 00 00 // Ups.. ptr has changed! It's not a valid pointer value anymore
                        // As a string it is inux\0

SolarisLinux            // print a
Segmentation fault      // print ptr crashes because ptr doesn't hold a valid pointer value

So on my system the explanation is:

a is located in memory just before ptr so when strcat write out of bounds of a it actually overwrites the value of ptr. Consequently the program crashes when trying to use ptr as a valid pointer.

So for your specific questions:

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault.

No. It's the value of ptr that has been overwritten. The sring literal are most likely untouched

2)Why does p a in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux".

This is a guess - nothing more. My guess is that gdb knows that a is 8 bytes so printing a directly only prints 8 bytes. When printing a + 0 mys guess is that gdb sees a + 0 like a pointer (and therefore can't know the object size) so gdb keeps printing until it sees a zero-termination.

Support Ukraine
  • 42,271
  • 4
  • 38
  • 63
1

If the question is "I know it's wrong, but why did it do that?", there are sort of two ways of answering it.

(1) Undefined behavior means anything can happen. Taking an array of size 8 and writing 13 characters to it is a really wrong thing to do. You're overwriting five bytes of memory that were presumably in use for something else, so overwriting them means... anything can happen. (But now I'm repeating myself.)

I know you asked the question in all sincerity, but I have to say, to me these questions always sound like: "I ran through a busy intersection when the sign said Don't Walk. A blue car ran over me, and I broke my left leg. I don't understand why. Why wasn't I hit by a red truck? Why didn't I break my right arm?"

(2) Let's look at a likely layout of the memory allocated for this program:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | \0 |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | 78 | 56 | 34 | 12 |
            +----+----+----+----+

            +----+----+----+----+----+----+
0x12345678: | L  | i  | n  | u  | x  | \0 |
            +----+----+----+----+----+----+

Here I'm imagining that the string "Linux" is stored at address 0x12345678, so ptr holds that value. I'm imagining that your machine uses 32-bit pointers. (These days, though, it might well use 64.) I'm imagining that your machine uses "little endian" byte order, meaning that the bytes making up the pointer p are stored in the opposite order in memory than you might expect.

You said that after calling strcat, a printed out the concatenated string you expected, but the program crashed when you tried to print ptr. Let's change the printout of ptr to

printf("%p: %s\n", ptr, ptr);

Before the call to strcat, this will print something like

0x12345678: Linux

But here's what the call to strcat actually does:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | L  |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | i  | n  | u  | x  | \0
            +----+----+----+----+

Now, the printout of ptr is going to be something like

0x78756e69: Segmentation violation (core dumped)

You overwrote the pointer ptr, so it no longer points to address 0x12345678 where the string "Linux" is stored, it now points to location 0x78756e69, where those hex digits come from the characters i n u x. If you don't have permission to access address 0x78756e69, you'll get a crash. If you do have permission to access location 0x78756e69, you'll get some garbage string printed.

Now, with all of that said, it's important to note that this is not necessarily what will happen. I've assumed that the compiler stored the pointer ptr right after the array a in memory. That's one possibility, but obviously not the only possibility. If the compiler happened to store ptr somewhere else, then something else would get overwritten by inux, and something else might go wrong. Or nothing might go wrong. (In other words, you might get hit by the blue car, or you might get hit by the red truck, or you might get lucky and make it across the street without being hit at all.)


Addendum: I've just looked at your post more carefully, and I see that gdb told you that ptr had changed to 0x78756e69, and that it couldn't access the memory there. But now we know where that strange value 0x78756e69 probably came from. :-)

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
0

Well, here we've got a pointer mistake.

I'll try to be understandable :

Constant strings (like "Linux" and "Solaris") are stored in a specific memory area of the program. For your program, among other strings (like error message for instance), there should be an area with : "Linux\0Solaris\0%s\n\0%s\n\0".

When you do :

char *ptr = "Linux";
char a[] = "Solaris";

you assign ptr to the address of the 'L' char and you are given 8 * sizeof(char) memory on the stack where "Solaris\0" is then copied.

When you concat those two strings, since you never created a new memory space (doing malloc or char str[50] for instance), ask strcat to write after the end of the stack memory reserved for your function usage. This is the kind of programming mistake that causes stackoverflow.

Here gdb tries it's best to display the strings.

(gdb) p ptr
$15 = 0x400624 "Linux"

Pointer to static string area, display correctly

(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"

Pointer to stack displayed as you would expect

(gdb) p a
$21 = "SolarisL"

Pointer to a 8 char len area, gdb knows the size, displays you the 8 first char.

(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"

Pointer to stack (gdb doesn't know the size since you do pointer arithmetic)

(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>

This one is tricky. See here, ptr does not have the same address as the first time you printed it. There is a possibility that you wrote over ptr value at some point (as you wrote somewhere on the stack that you shouldn't have).

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault.

Nope, the original location can't be overwritten.

2)Why does p a in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux".

It's a debugger, it is written to avoid some type of error, so when it can, he reads only what should be red.

  • 1
    *Actually, `char *ptr` and `char a[]` are the exact same thing* No, they are most definitely **not** "the exact same thing". `char *ptr = "Linux";` creates a pointer-to-char variable named `ptr` and initializes it with the address of the string literal `"Linux"`. Attempts to modify the memory *pointed to by `ptr`* can fail. But `char a[] = "Solaris";` creates an actual array of `char` initialized with the characters `"Solaris\0"`. This string can be safely modified, such as by `a[0]='s';` – Andrew Henle Aug 08 '18 at 13:34
  • See [Why do I get a segmentation fault when writing to a string initialized with “char \*s” but not “char s\[\]”?](https://stackoverflow.com/questions/164194/why-do-i-get-a-segmentation-fault-when-writing-to-a-string-initialized-with-cha) – Andrew Henle Aug 08 '18 at 13:34
  • You are right, I edit my post, these were not the correct words. – Pierre Podevin Aug 08 '18 at 13:39
  • Everything deserve a downvote when you like to downvote ;-) I edited my post since, I did make a mistake when rereading myself. This post is the result of several edits (as every post should be), I made a mistake at one of them ^^ – Pierre Podevin Aug 08 '18 at 13:42
  • I removed this not so useful part of my post. You a right on the fact that this does not help beginners. Maybe you can help answer me on this : is it possible to change the address of a (to make it point to an other array or smth like that) ? – Pierre Podevin Aug 08 '18 at 14:14
  • 2
    @PierrePodevin No, it is not possible to change the address of `a` to make it point to another array. If you want to be able to change where something points, that's why you use a pointer instad of an array. In any case, `a` is not a pointer, and it's either incorrect or badly misleading to say that it "points" anywhere. These are all basic points; if you do a search you'll find that the question has been asked (and answered) hundreds of times on Stackoverflow already. – Steve Summit Aug 08 '18 at 14:37
  • I rarely use array in C. I only use them declaring the form type var_name[number_of_cases]. Never with no size value. I find it misleading. At the end of the day, it's all pointer anyway. So to me, either you use a buffer of fix size `char buff[SIZE];` or you have dynamic memory `char *str = malloc(sizeof(char) * nm_element)` – Pierre Podevin Aug 08 '18 at 14:58
  • 1
    @PierrePodevin: It is **not** *all pointer anyway*: `char a[] = "Solaris";` is exactly equivalent to `char a[8] = "Solaris";`, a fixed size array. As a matter of fact, in C, all arrays have fixed sizes. The size may be determined at run time as in `int size = 8; char array[size];`, but once defined, the array size is fixed. – chqrlie Aug 08 '18 at 16:15
  • @PierrePodevin I'm sorry if this sounds harsh, but if you rarely use arrays and don't understand them well, please don't try to give advice about them to others: you're too likely to mislead. – Steve Summit Aug 08 '18 at 17:37
  • Nah, you didn't understand me well on that point. I do use arrays, like all the time. It's just that I don't use them in this exact syntax that to me is confusing. Either I use a fixed size buffer, a static string or a dynamically memory area (used as array). Then it's only a matter of how you see things. I do know C in a really fundamental way. So I really know my way around these kind of problems. I can even guess in what kind of memory is a string depending on its address (namely the address of the first char). An array is always of fixed size because of how and where it is created. – Pierre Podevin Aug 09 '18 at 08:04