2

I am trying to make a small function to get a string between two tags. But I get a segfault on str[len -3] = '\0';

Is it not possible to add a null-termination to the string passed and then send a pointer back?

Is it bad practice to change the index of the pointer instead of copying it over to a buffer and send it back?

Do I get a memory leak from the 3 bytes never getting freed?

/*
    format for a message
    <m>Hello world!</>13594750394883323106
    <m>"msg"</><checksum>
*/
//returns the string beetween tags
char *GetMessage(char *str) {
    int len = strlen(str);
    for (int i = 0; i < len; i++) {
        if (str[i] == '<' && str[i + 1] == 'm' && str[i + 2] == '>') {
            if (str[len - 3] == '<' && str[len - 2] == '/' && str[len - 1] == '>') {
                str[len - 3] = '\0';
                return &str[3];
            }
        }
    }
    return NULL;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
Jacob
  • 371
  • 3
  • 12
  • 2
    `return &str[3];` -> `return str + 3;` – Mad Physicist Oct 17 '16 at 19:49
  • Without seeing the code that frees the memory allocated to the string, we can only guess. Is there some software telling you that there is a memory leak? If so, can you show us the message? – Mark Plotnick Oct 17 '16 at 19:50
  • @MarkPlotnick Assume free() on the returned pointer. – Jacob Oct 17 '16 at 19:55
  • Doesn't this line (`if(str[i] == '<' && str[i+1] == 'm' && str[i+2] == '>')`) create problems when you're testing the last two characters of the string? For example, when` i == len - 1` (last character of the string), you'll be reference to characters that are beyond the end of the string if it's == `<' – Travis Griggs Oct 17 '16 at 19:56
  • Yes @TravisGriggs that might be a problem if the string is corrupt. I am not sure how I will handle that – Jacob Oct 17 '16 at 19:59
  • In the first paragraph, I think you mean `str[len - 3] = '\0';` rather than `str[str -3] = '\0';`. – Keith Thompson Oct 17 '16 at 19:59
  • Are you trying to free the memory returned by `&str[3]`? – Amadeus Oct 17 '16 at 20:01
  • If `str` points to a `malloc`-allocated chunk of memory, then `free(&str[3])` has undefined behavior; you can only `free()` a pointer value that was returned by `malloc` or equivalent (or a null pointer). – Keith Thompson Oct 17 '16 at 20:01
  • @Amadeus Yes later I will try to free that memory. I have a buffer for an incoming string, that buffer will be sent to this function. The returned value will be evaluated but not modification. I will then free that buffer – Jacob Oct 17 '16 at 20:02
  • 1
    Why does it return `str + 3`?. Shouldn't it return `str + i + 3`? – Travis Griggs Oct 17 '16 at 20:02
  • How do you handle the string containing multiple messages? – dbush Oct 17 '16 at 20:02
  • 3
    It would be useful to see how this function is called. – dbush Oct 17 '16 at 20:03
  • If you're going to free the sub fragment and/or the original independent from each other, then you should use strdup() or similar to get a second copy of the substring, so you can free them independently – Travis Griggs Oct 17 '16 at 20:03
  • @Jacob to prevent the problems described by Travis Griggs, you need to stop the loop when [i+2] == [len-3]. To explain the segfault, the input str is maybe considered as const. – J. Piquard Oct 17 '16 at 20:05
  • Can you give us enough code to reproduce the problem? – David Schwartz Oct 17 '16 at 20:07
  • 3
    Chances are you're passing a string literal to this function and you can't modify string literals reliably — it often crashes (though the behaviour is undefined and worse things than crashes can happen). – Jonathan Leffler Oct 17 '16 at 20:09
  • @JonathanLeffler Aha. I did not know that. I am just calling this function with a GetMessage("Hello world!>"); string when i got the problem. But that explains it – Jacob Oct 17 '16 at 20:16
  • @TravisGriggs this line `if(str[i] == '<' && str[i+1] == 'm' && str[i+2] == '>'` works fine because if `i` is `len-1`, `str[i+1]` is `\0` (i.e. is not `m`) and `str[i+2]` is not evaluated thanks to the short circuit nature of `&&`. Stopping earlier as @J. Piquard suggested is a good option, though. – Remo.D Oct 17 '16 at 21:09
  • @Mad Physicist: `return &str[3];` is the same as `return str + 3;` with the former version being *preferable* in many cases. – AnT stands with Russia Oct 17 '16 at 21:18
  • @Mad Physicist: `if(str[i] == '<' && str[i+1] == 'm' && str[i+2] == '>')` is **not** equivalent to `strcmp(str + i, "") == 0`. It can be replaced with `strncmp`, but not with what you suggested. – AnT stands with Russia Oct 17 '16 at 21:19
  • @AnT. Good call. Should be `strncmp(str + i, "", 3)`. – Mad Physicist Oct 17 '16 at 21:24

3 Answers3

3

To better reasoning about this, let's to draw the memory layout of your string. If I got it correctly it is something like:

                     111111
           0123456789012345...
        -> xxxxx<m>Hi</>yyy...\0

Now you want to pass a pointer to the first character of your string to GetMessage() and print the first message. Something like

fullmsg ="....";
m = fullmsg;
m = GetMessage(m);
printf("msg: %s\n",m);
... // Advance m

Of course you can't do fullmsg=GetMessage(fullmsg) or weird things may happen (memory leak being the least :) ).

When you've found the <m> tag your situation is:

                     111111
           0123456789012345...
    str -> xxxxx<m>Hi</>yyy...\0
                ^             ^
                i             len

Which shows that returning str+3 does not do what you want. Your return value should be str+i+3.

In the same vein, it's not str[len-3] that you should put to \0. Just imagine the effect on: GetMessage("x<m>aa</>yzyy"). The character in position len-3 is z. Not what you wanted, I guess.

What you could do is to use another index to find the end of the message:

      for (j = i+1; j<len-2; j++) {
        if (str[j] == '<' && str[j+1] == '/' && str[j+2] == '>') {
           // end of message found!!!!
        }
      }

So that when you have found the end of the message your situation is:

                     111111
           0123456789012345...
    str -> xxxxx<m>Hi</>yyy...\0
                ^    ^        ^
                i    j        len

I wish I could tell you that you could simply do str[j]='\0' and return str+i+3 but, unfortunately I can't. If you do it and pass a literal string (m=GetMessage("Hi there!")` you will get a coredump as the memory used for a string between quotes is read-only.

A possible solution is to change the semantic of your GetMessage() a little bit:

    // returns the length of the message if the string starts with <m>
    int GetMessage(char *str) {
       int len = 0;
       if (str[0]=='<' && str[1]=='m' && str[2]=='>') {
         str += 3;
         while (str[0] != '\0') {
            if (str[0]=='<' && str[1]=='/' && str[2] == '>')
              return len;
            str++;
         }
       }
       return 0;
    }

Now, when you want to print the message you can do something like:

    fullmessage = "xxxx<m>Hi</>yyyyy";
    m = fullmessage;
    l = 0;

    /* skip until you find a '<m>' tag */
    while (m[0] != '\0' && ((l=GetMessage(m)) == 0) m++;

    /* l can be 0 here if there was no message in the string */
    if (l>0) printf("msg = %.*s",l,m+3);

I didn't test it fully but I hope you got the idea.

Remo.D
  • 16,122
  • 6
  • 43
  • 74
1

You are getting a crash on str[len-3] = '\0'; because you are trying to write to a read-only location. String literals used as values can be placed in read-only storage, attempting to modify them invokes undefined behavior.

You cannot pass constant strings to this function.

There is a bug in your code: return &str[3]; should be return &str[i + 3];.

Furthermore, your code does not handle the examples in the comments because the </> substring is not at the end.

Here is a simplified version:

#include <string.h>

char *GetMessage(char *str) {
    str = strstr(str, "<m>");
    if (str != NULL) {
        str += 3;
        char *p = strstr(str, "</>");
        if (p != NULL)
            *p = '\0';
    }
    return str;
}

Note that the pointer returned by GetMessage() points inside the argument string. You cannot pass it to free() to deallocate the string, this would invoke undefined behavior. Only the original value returned by malloc() can be passed to free().

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Why did you suppose the string passed as parameter was read only? Do you work nearby? Are you the guy who possed the problem to @jacob ??? The parameter string is defined as `char *`, and not `const char *`, so there's nothing illegal in changing the original string, IMHO – Luis Colorado Oct 19 '16 at 06:44
  • @LuisColorado: I know this from reading Jacob's comment: *I did not know that. I am just calling this function with a GetMessage("Hello world!>"); string when i got the problem. But that explains it*. The code as written is not *illegal*, but it does make the assumption that the parameter string is modifiable, which a string literal isn't. It is a pity the compiler would not complain by default about the `const` qualification mismatch. Invoking the compiler with increased warning level (ie `gcc -Wall -W` or equivalent) would produce a useful diagnostic message. – chqrlie Oct 21 '16 at 04:48
  • well, don't despair... I use to be sarchastic when I make some comment like this one. Don't take me seriously. I agree with you in the sentence _that it's a pity the compiler says nothing about it_. But that's also a reminiscence of old legacy code to be compilable. – Luis Colorado Oct 21 '16 at 09:39
0

To resume answers for the questions.

1) Is it not possible to add a nulltermination to the string passed and then send a pointer back?

It turned out that the string passed as a literal was the problem in this case. Jonathan Leffler pointed this out in the comments and that was exactly how I tested the function.

GetMessage("<m>Hello world!</>");

When tested in conjunction with other functions answering without literals it worked well.

GetMessage(ReadSerialData());

2) Is it bad practice to change the index of the pointer instead of copying it over to a buffer and send it back?

This appears to be of preference. But should not caus any problems.

3) I get a memory leak from the 3 bytes never getting freed?

This was really good explained here


Thanks for all the inputs!

Community
  • 1
  • 1
Jacob
  • 371
  • 3
  • 12