2

I want to extract words from string.

I dont want to use strtok because it will spoil my source string. Another thing is that I am wondering if it is possible to manage to do what I want without using cycles.

Here is my code sample. It successfully reads first word but second and third remain empty.

char source[] = "XXX|YYY|ZZZ";

char word1[10] = "";
char word2[10] = "";
char word3[10] = "";

sscanf( source, "%[^|]s|%[^|]s|%s", word1, word2, word3 );

Is it really possible to do it using sscanf or I am on the wrong path?

UPDATE:

It looks like user3121023's answer does not work for empty words.

char source[] = "XXX||ZZZ";

char word1[10] = "";
char word2[10] = "";
char word3[10] = "";

sscanf( source, "%[^|]|%[^|]|%s", word1, word2, word3 );

Third word remains empty. What should I do in this situaltion?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
walruz
  • 1,135
  • 1
  • 13
  • 33
  • @user3121023: Thank you! It works! – walruz Mar 30 '18 at 09:02
  • @Jean-François Fabre: you are right - but I spent hour trying to find answer to my question in the internet and it just did not show up. – walruz Mar 30 '18 at 09:07
  • 1
    note: this was closed as duplicate of "https://stackoverflow.com/questions/8141208/sscanf-with-delimiter-in-c" but here there's the "empty word" constraint. So I reopened. – Jean-François Fabre Mar 30 '18 at 09:31

1 Answers1

3

Your sscanf() format does not empty substrings, neither does it protect against potential buffer overflows if the target arrays are smaller than the source string.

Here is a solution with strcspn() and a utility function strcpy_n:

#include <string.h>

char *strcpy_n(char *dest, size_t size, const char *src, size_t n) {
    if (size > 0) {
        if (n >= size)
            n = size - 1;
        memcpy(dest, src, n);
        dest[n] = '\0';
    }
    return dest;
}

...

    char source[] = "XXX||ZZZ";
    char word1[10], word2[10], word3[10] = "";

    size_t pos = 0, len;

    len = strcspn(source + pos, "|");
    strcpy_n(word1, sizeof(word1), source + pos, len);
    pos = pos + len + (source[pos + len] == '|');

    len = strcspn(source + pos, "|");
    strcpy_n(word2, sizeof(word2), source + pos, len);
    pos = pos + len + (source[pos + len] == '|');

    len = strcspn(source + pos, "|");
    strcpy_n(word3, sizeof(word3), source + pos, len);
    pos = pos + len + (source[pos + len] == '|');

...

You can wrap the above code into another utility function getfield() to factor more code:

/* returns non zero if there are more fields to parse */
int getfield(char *dest, size_t size, const char *source, size_t *ppos) {
    int has_separator = 0;
    size_t pos = *ppos;
    size_t len = strcspn(source + pos, "|");
    strcpy_n(dest, size, source + pos, len);
    pos += len;
    has_separator = (source[pos] == '|');
    *ppos = pos + has_separator;
    return has_separator;
}

 ...

    char source[] = "XXX||ZZZ";
    char word1[10], word2[10], word3[10];

    size_t pos = 0;

    /* parse the fields, empty and missing fields are set to "" */
    getfield(word1, sizeof(word1), source, &pos);
    getfield(word2, sizeof(word2), source, &pos);
    getfield(word3, sizeof(word3), source, &pos);

...
chqrlie
  • 131,814
  • 10
  • 121
  • 189