2

I would like to read a string containing a undefined amount of suffixes, all separated by ;

example 1: « .txt;.jpg;.png »

example 2: « .txt;.ods;_music.mp3;.mjpeg;.ext1;.ext2 »

I browsed the web and wrote that piece of code that doesn't work:

char *suffix[MAX]; /* will containt pointers to the different suffixes */
for (i = 0; i < MAX ; i++)
{
    suffix[i] = NULL;
    if (suffix_str && sscanf(suffix_str,"%[^;];%[^\n]",suffix[i],suffix_str) < 1)
        suffix_str = NULL;
}

After the first iteration, the result of sscanf is 0. Why didn't it read the content of the string?

How should be parsed a string containing an undefined number of elements? Is sscanf a good choice?

Community
  • 1
  • 1
Jav
  • 1,445
  • 1
  • 18
  • 47
  • what is `suffix` variable supposed to be? – Milind Dumbare Feb 10 '15 at 08:22
  • 3
    Why not split it up using e.g. [`strtok`](http://en.cppreference.com/w/c/string/byte/strtok) instead? – Some programmer dude Feb 10 '15 at 08:23
  • `char *suffix[MAX]` is an array of `char*` pointing to the different suffixes or being `NULL` when no more suffixes are found. The whole string is parsed after the loop to replace `;` by `\0`. – Jav Feb 10 '15 at 08:25
  • As for your current solution, are the pointers pointing to valid memory? Have you actually allocated memory for the strings? Because if you just have an array of uninitialized pointers, you will have [*undefined behavior*](http://en.wikipedia.org/wiki/Undefined_behavior). – Some programmer dude Feb 10 '15 at 08:26
  • Although I couldn't find anything official, I expect that using the same string as the input and an output is not allowed. – user3386109 Feb 10 '15 at 08:26
  • I did allocate the memory for the string : `printf("%s",suffix_str)` before the loop displays the correct string – Jav Feb 10 '15 at 08:27
  • 4
    You're using the same buffer for your source data *and* a target argument. C9899 § 7.21.6.7p2 : "If copying takes place between objects that overlap, the behavior is undefined." - ... So much for that. Back to the drawing board. – WhozCraig Feb 10 '15 at 08:27
  • @WhozCraig I already used an equivalent `sscanf` pattern where the src and dst were the same. Are you sure it is copying? What I want is actually change the value of `suffix_str` to go through the string (I have another `char *` pointer to the complete string) – Jav Feb 10 '15 at 08:28
  • Please note that even if some code seems to work, that's one of the way you can experience undefined behavior. The important word here is ***seems*** to work. – Some programmer dude Feb 10 '15 at 08:28
  • Show us the declaration/allocation for `suffix`. – Klas Lindbäck Feb 10 '15 at 08:30
  • 1
    Does `suffix_str` somehow address two *different* non-overlapping buffers simultaneously in the single call to `sscanf` where it is presented as both the source buffer *and* the second target argument? Yeah, I'm sure. I'm not convinced that is your *only* problem, but seeing it invokes undefined behavior, you can't rely on anything afterward regardless. Fix it: use an intermediate temporary buffer and copy-back on success. – WhozCraig Feb 10 '15 at 08:31
  • I updated the code for more comprehension about the `suffix` array – Jav Feb 10 '15 at 08:35
  • 1
    ... and I too think `strtok` is a better road to travel regardless (agreeing with Joachim). – WhozCraig Feb 10 '15 at 08:36
  • 1
    And that's not how you should use sscanf(). If you want to store pointers, use `strtok()` or simply write your own loop to mark the pointers. – askmish Feb 10 '15 at 08:36
  • Thank you for the `strtok` solution, I'm implementing it. – Jav Feb 10 '15 at 08:37
  • @WhozCraig & @Joachim Pileborg : It works like a char. Thank you. `char *suffix_token = strtok(suffix_str,";"); for(i = 0; i < MAX ; i++) { suffix[i] = suffix_token; if (suffix_token) { /* if suffix_token is not NULL, continue parsing */ suffix_token = strtok(NULL,";"); } }` – Jav Feb 10 '15 at 09:13
  • I let you write the answer to the question If you like, otherwise, I will write it later. – Jav Feb 10 '15 at 09:15
  • 1
    it's a sad day when changing to `strtok` is an improvement :) – M.M Feb 10 '15 at 09:30
  • @MattMcNabb I had to wash my hands after typing that answer. – WhozCraig Feb 10 '15 at 09:31

2 Answers2

3

First, as covered in general comment, you're invoking undefined behavior by using the same buffer as both a source input and destination target for sscanf. Per the C standard, that isn't allowed.

The correct function to use for this would likely be strtok. A very simply example appears below.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    char line[] = ".txt;.ods;_music.mp3;.mjpeg;.ext1;.ext2";
    size_t slen = strlen(line); // worst case
    char *suffix[slen/2+1], *ext;
    size_t count=0;

    for (ext = strtok(line, ";"); ext; ext = strtok(NULL, ";"))
        suffix[count++] = ext;

    // show suffix array entries we pulled
    for (size_t i=0; i<count; ++i)
        printf("%s ", suffix[i]);
    fputc('\n', stdout);
}

Output

.txt .ods _music.mp3 .mjpeg .ext1 .ext2 

Notes

  • This code assumes a worst-case suffix count to be half the string length, thereby a list of single character suffixes split on the delimiter.
  • The suffix array contains pointers into the now-sliced-up original line buffer. The lifetime of usability for those pointers is therefore only as long as that of the line buffer itself.

Hope it helps.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
0

There are several ways to tokenize from a C string. In addition to using strtok and sscanf you could also do something like this:

char *temp = suffix_str;
char *suffix[i];
for (int i = 0; i < MAX; i++)
{
    int j = 0;
    char buf[32];
    while (*temp != '\0' && *temp != '\n' && *temp != ';')
    {
        buf[j++] = *temp;
        temp++;
    }
    buf[j] = 0;

    if (*temp == ';') temp++;

    suffix[i] = malloc((strlen(buf) + 1) * sizeof(char));
    //handle memory allocation error
    strcpy(suffix[i], buf);
}
askmish
  • 6,464
  • 23
  • 42