0

I am trying to parse the phone number out of a sip uri, or if the string is just a number, returning that. Basically, I want to chop off the @ and anything after it if it exists.
I wrote a small function using strtok() but the function always returns NULL.
Can anyone tell me what I'm doing wrong here?

char* GetPhoneNumber(const char* sCallee) {
    char* buf = (char*)malloc(strlen(sCallee) + 1);
    strcpy(buf, sCallee);
    char *p = strtok (buf, "@");
    char *q = strtok (p, ":");
    if (buf) {
        free(buf);
    }
    return q;
}

int main() {
    const char* raw_uri = "2109999999@10.0.0.1";
    char* number = GetPhoneNumber(raw_uri);
    if (number == NULL) {
        printf("I am screwed! %s comes out null!", raw_uri);
    }
    char* second = GetPhoneNumber("2109999999");
    if (second == NULL) {
        printf("This does not work either.");
    }
}
Yunnosch
  • 26,130
  • 9
  • 42
  • 54
Cinder Biscuits
  • 4,880
  • 31
  • 51
  • Could you possibly explain the format(s) you are attempting to recognize with that code? – rici Feb 28 '18 at 21:59
  • 1
    `while (q != NULL)` then `if (q != NULL) break;` ?? Suggests `char *q = strtok (p, ":");` fails, but since you never check that `char *p = strtok (buf, "@");` succeeds, it's hard to tell. – David C. Rankin Feb 28 '18 at 22:01
  • @rici Basically it's a ten digit number followed by @ followed by an IP address (could be ipv4 could be ipv6, really doesn't matter, anything after the @ is not required.) – Cinder Biscuits Feb 28 '18 at 22:03
  • @DavidC.Rankin Removing the while loop and it still returns NULL. if `char *p = strtok (buf, "@");` doesn't succeed, it should just return the full string, no? I was hoping this would handle the case of there being just a number or a number@address. – Cinder Biscuits Feb 28 '18 at 22:06
  • so basicaly you want everything up to the @ is that correct? – pm100 Feb 28 '18 at 22:08
  • Where do the colons come in, then? – rici Feb 28 '18 at 22:09
  • You've edited your post and removed the "interesting" bits, now you question makes no sense with your new code. Because you also have an error in the way you free the memory, I updated my answer to address that. – Pablo Feb 28 '18 at 22:13
  • 2
    SAMPLE INPUT and EXPECTED OUTPUT in the question NOT COMMENTS - please – pm100 Feb 28 '18 at 22:13
  • 2
    The returned pointer `q` comes from the `free`d `buf` which is bad. – Weather Vane Feb 28 '18 at 22:15
  • @DavidC.Rankin unless `p` is `NULL`, the `strtok(p,...)` call shouldn't return `NULL`, if there are no delimiters in the string, `strtok` returns `p`. – Pablo Feb 28 '18 at 22:16
  • Did you know that: `strtok` is [considered obsolete](https://www.freebsd.org/cgi/man.cgi?query=strtok&sektion=3&apropos=0&manpath=freebsd#DESCRIPTION) by FreeBSD. Microsoft states that the function poses a [security risk](https://msdn.microsoft.com/en-us/library/2c8d19sb.aspx#Remarks) – Myst Feb 28 '18 at 22:20
  • 1
    Yes, WV points out your main problem: even if GetPhoneNumber *did* parse things correctly, it returns a pointer to freed memory. This is C: you can't make a function that returns a string, because C doesn't really have a string type. You need to have the caller either allocate space and pass it in, or else have the caller free it when it's done. – Lee Daniel Crocker Feb 28 '18 at 22:33

2 Answers2

3

edit

This

If it's returning NULL because the while loops ends when q is NULL. So I assume that the q = strtok (NULL, ":"); also returns NULL and that's why it leaves the loop.

makes no sense anymore, since you've edited your question and removed the code in question.

end edit

Regardless, you are using it wrong, though.

strtok returns a pointer to the original string plus an offset that marks the beginning of the next token. The pointer is at an offset of buf. So when you do free(buf), you are making the pointers returned by strtok also invalid.

You should also check first if malloc returns NULL and then try to parse it. Checking of malloc returning NULL after the parsing is wrong. Also you would need to make a copy of the value you are returning.

char* GetPhoneNumber(const char* sCallee) {
    char* buf = malloc(strlen(sCallee) + 1);
    if(buf == NULL)
        return NULL;

    strcpy(buf, sCallee);

    char *p = strtok (buf, "@");
    if(p == NULL)
    {
        // no token found
        free(buf);
        return NULL;
    }

    char *q = strtok (p, ":"); // makes no sense after your edit
                               // I don't see any colons in your input
                               // but I leave this to show you how
                               // it would be done if colons were present

    if(q == NULL)
    {
        // no token found
        free(buf);
        return NULL;
    }

    char *copy = malloc(strlen(q) + 1);
    if(copy == NULL)
    {
        free(buf);
        return NULL;
    }

    strcpy(copy, q);
    free(buf);

    return copy;
}

Also when you call this function, you have to remember to free the pointer returned by GetPhoneNumber.

edit2

To be honest, I don't see why you even use strtok if the number comes before @. You can use strchr instead:

char* GetPhoneNumber(const char* sCallee) {

    if(sCallee == NULL)
        return NULL;

    char *p = strchr(sCallee, '@');

    if(p == NULL)
        return NULL; // wrong format

    char *q = calloc(1, p - sCallee + 1);

    if(q == NULL)
        return NULL;

    strncpy(q, sCallee, p - sCallee);

    // q already \0-terminated because of calloc

    return q;
}
Pablo
  • 13,271
  • 4
  • 39
  • 59
3

Your code works fine, you just cannot free(buf) before the return or you release the memory holding number and second leading to Undefined Behavior. You further need to validate each step. Suggest something like:

(Note: updateded to protect against a leading '@' in sCallee or a leading ':' in p resulting in NULL being returned)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *GetPhoneNumber(const char* sCallee) 
{
    if (*sCallee == '@') {  /* protect against leading '@' */
        fprintf (stderr, "invalid sCallee - leading '@'\n");
        return NULL;
    }

    char* buf = malloc (strlen(sCallee) + 1);
    if (!buf) {         /* if you allocate/validate */
        perror ("malloc - buf");
        return NULL;
    }
    strcpy(buf, sCallee);

    char *p = strtok (buf, "@");    /* get first token with '@' */
    if (!p) {   /* validate */
        fprintf (stderr, "error: strtok with '@' failed.\n");
        return NULL;
    }
    if (*p == ':') {        /* protect against leading ':' */
        fprintf (stderr, "invalid p - leading ':'\n");
        free (buf);
        return NULL;
    }
    char *q = strtok (p, ":");      /* get first token with ':' */

//     free(buf);

    return q;
}

int main () {

    const char* raw_uri = "2109999999:abc@10.0.0.1";
    char* number = GetPhoneNumber(raw_uri);

    if (number == NULL) {
        printf("I am screwed! %s comes out null!\n", raw_uri);
    }
    else {
        printf ("number: %s\n", number);
        free (number);
    }

    char* second = GetPhoneNumber("2109999999");

    if (second == NULL) {
        printf("This does not work either.\n");
    }
    else {
        printf ("second: %s\n", second);
        free (second);
    }
}

(note: There is no need to cast the return of malloc, it is unnecessary. See: Do I cast the result of malloc?)

Example Use/Output

$ ./bin/strtokpnum
number: 2109999999
second: 2109999999

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/strtokpnum
==24739== Memcheck, a memory error detector
==24739== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==24739== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==24739== Command: ./bin/strtokpnum
==24739==
number: 2109999999
second: 2109999999
==24739==
==24739== HEAP SUMMARY:
==24739==     in use at exit: 0 bytes in 0 blocks
==24739==   total heap usage: 2 allocs, 2 frees, 35 bytes allocated
==24739==
==24739== All heap blocks were freed -- no leaks are possible
==24739==
==24739== For counts of detected and suppressed errors, rerun with: -v
==24739== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.


Protect Against All Corner Cases

Though not part of the question, there are possible corner cases such as multiple leading '@' and multiple leading ':' in the secondary string. If you are going to protect against the possibility of having multiple leading '@' delimiters or multiple leading ':' in the secondary string, then simply use the multiple allocation method. Adding any additional checks and you might as well just walk a pointer down sCallee.

char *gettok (const char *s, const char *d1, const char *d2)
{
    char *buf = malloc (strlen (s) + 1),
        *p = NULL,
        *q = NULL,
        *s2 = NULL;

    if (!buf) { /* validate allocation */
        perror ("malloc - buf");
        return NULL;
    }
    strcpy (buf, s);    /* copy s to buf */
    if (!(p = strtok (buf, d1))) {  /* if token on d1 fails */
        free (buf);                 /* free buf */
        return NULL;
    }
    if (!(q = strtok (p, d2))) {    /* if token on d2 fails */
        free (buf);                 /* free buf */
        return NULL;
    }

    /* allocate/validate return */
    if (!(s2 = malloc (strlen (q) + 1))) {
        perror ("malloc - s2");
        return NULL;
    }
    strcpy (s2, q); /* copy token */

    free (buf);     /* free buf */

    return s2;      /* return token */
}

In that case you would simply call

gettok ("@@@@@:::::2109999999:abc@10.0.0.1", "@", ":');

and you are protected.

(and if you are getting strings like that from your sip uri, fix that process)

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • I don't like your frees in the main. If the pointer you return were the second token, you were freeing a pointer that malloc didn't return, but an offset of it. I know that in this example, you are returning the first token anyway, but it would be better if you return a `malloc` copy anyway. – Pablo Feb 28 '18 at 22:18
  • You allocate for both `number` and `second` in `GetPhoneNumber`. There is no other way to do it. You cannot return a pointer from `GetPhoneNumber` to memory that has been freed -- UB. (recall `q` points to memory in `buf` allocated at the beginning of `GetPhoneNumber`, if you free there, you have nothing to return). – David C. Rankin Feb 28 '18 at 22:19
  • No, I'm talking about `char *q = strtok (p, ":");`. Here in this case `q` will be the same as `buf`, but let's say the number `p` were `"abc:1234`" and you do `strtok(p, ":"); q = strtok(NULL, ":");` then `q` will be at an offset of `buf` and `free(q)` would be UB. That's way I said, it would be better to return a `strdup/malloc` copy of `q` instead. – Pablo Feb 28 '18 at 22:22
  • It can't be `char *q = strtok (NULL, ":");` or you will be searching in the region of `p` after the `@` symbol. You want to search within the token returned by `p`. (I'll wager my 1 malloc against your 2 `:)` – David C. Rankin Feb 28 '18 at 22:26
  • I think you don't get what I mean. In this example `q` will be equals to `buf`, so no problem there. But consider a different example: `"Tina:1151551551@192.168.0.1"` as an input string. `p` would point to `buff` after `p = strtok(buf, "@");` and would be the string `"Tina:1151551551`". To get to the number, you would have to do `strtok(p, ":"); q = strtok(NULL, ":");` and now `q` points to `buf+5`. If you return `q` and the caller does `free(result_from_GetPhoneNumber);`, then it's doing `free(buf+5)` which is UB. – Pablo Feb 28 '18 at 22:31
  • @Pablo - I see what you mean, but if you only call `char *p = strtok (buf, "@");` once and `char *q = strtok (p, ":");` once, you will always be pointing at the beginning of the block allocated for `buf`. (if you call it any more than once -- then you have to copy as you did) I can always invent data that requires a second call, but I can only go by the data given. – David C. Rankin Feb 28 '18 at 22:34
  • @DavidC.Rankin: Not if the buffer starts with one or more colons (or at-signs). "The start of the next token is determined by scanning forward for the next nondelimiter byte in str." – rici Feb 28 '18 at 22:36
  • Wait, wait, wait... `p` scans for `@`. You don't care how many colons there are. `q` scans for the **first** token delimited by a `':'` -- again, you don't care how many there are. If we invent a new Question saying I want the token **after** the first `':'`, then of course you have to call `strtok` twice and in that case you need to alloc/copy before return because `q` would no longer point to the **beginning** of the allocation -- but that isn't the Question asked. – David C. Rankin Feb 28 '18 at 22:39
  • 1
    @DavidC.Rankin that's why I said that in this example it's fine, because `q` happens to point to the same location as `buf`. If the OP uses this code and then tries an input like I did, then the OP will get an error and probably wouldn't know why. That's why I said, in interest of correctness, it would be better to return a `malloced` copy, **even** if in this particular example you wouldn't need it. – Pablo Feb 28 '18 at 22:40
  • Yes, in that case I totally agree with you. – David C. Rankin Feb 28 '18 at 22:40
  • @DavidC.Rankin I didn't reinvent the question, the OP had code where he/she actively parsed for colons, later changed the code and remove that part. That's why I assume that the real string the OP might be parsing may look in reality like the one "I invented". – Pablo Feb 28 '18 at 22:43
  • @DavidC.Rankin: I repeat: on the first call, strtok will return the address of the first character in the buffer which is not in the delimiter set. So if `p = strtok(buf, "@")` does not return `buf` if the first character of `buf` is an `'@'`. Subsequently, `q = strtok(p, ":")` will not return `p` if the first character of `p` is a colon. Consequently, if `buf` starts with one or more at-signs or colons, `q` will not be `buf` and `free(q)` will be UB. – rici Feb 28 '18 at 22:52
  • That is a case that I did not protect against. It could well occur. So in that case you are correct. I will note that. – David C. Rankin Feb 28 '18 at 22:56
  • I'm sorry, don't want to be peaky, but I think your check is also not 100% right, it would fail if you have more than one leading colon. It think it's best not to check `*p == ':'` but rather `char *q = strtok(p, ":"); if(q == NULL) { ... };`. – Pablo Feb 28 '18 at 23:35
  • Why you want to keep changing the data is beyond me. Like I said, I can invent any string that will cause the single allocation case to fail. That wasn't question asked. However, just for the sake of completeness, and just to protect against a screwed up sip uri retrieval that returns 50 leading `'@'` followed by 100 leading `':'`, allocating separately for the final return is the way to go. If your sip uri intermixes the 50 leading `'@'` and 100 leading `':'` before the number -- you ain't getting it. – David C. Rankin Mar 01 '18 at 00:02
  • @DavidC.Rankin hey, don't get mad at me, I'm not trying to upset/bother you. I was only pointing out that on some inputs, the code would not work as intended. Because the data comes from an external source you cannot trust, you should add these extra checks, even when for the example strings provided by the OP you wouldn't need them. Sorry for trying to help. – Pablo Mar 01 '18 at 00:07
  • I'm not at all, I appreciate the discussion and I do see the point, but there is a certain level of frustration associated with answering 'Question A' only to be told it doesn't work for 'Question B'. Now, is it fair to say, that won't work for 'Question B' -- sure it is, but at some point you have to simply answer the question asked and not one that could have been asked. It would be the same as me pointing out that your solution fails for `"@:@2109999999@"` because there is always the possibility you get a second `'@'` before the digits... `:)` – David C. Rankin Mar 01 '18 at 07:05