8

How to split a string into tokens by '&' in C?

nbro
  • 15,395
  • 32
  • 113
  • 196
trrrrrrm
  • 11,362
  • 25
  • 85
  • 130
  • 1
    http://stackoverflow.com/questions/266357/tokenizing-strings-in-c – Laurynas Biveinis Jan 19 '10 at 07:32
  • Looks like that question is about splitting a literal string (although the question is low on details, saying "doesn't work"). This may or may not be what the OP wants. We should give him a chance to explain that. Is the string being split a literal string, or is it read-only? – Alok Singhal Jan 19 '10 at 07:38

4 Answers4

13

strtok / strtok_r

char *token;
char *state;

for (token = strtok_r(input, "&", &state);
     token != NULL;
     token = strtok_r(NULL, "&", &state))
{
    ...
}
nbro
  • 15,395
  • 32
  • 113
  • 196
R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
9

I would do it something like this (using strchr()):

#include <string.h>

char *data = "this&&that&other";
char *next;
char *curr = data;
while ((next = strchr(curr, '&')) != NULL) {
    /* process curr to next-1 */
    curr = next + 1;
}
/* process the remaining string (the last token) */

strchr(const char *s, int c) returns a pointer to the next location of c in s, or NULL if c isn't found in s.

You might be able to use strtok(), however, I don't like strtok(), because:

  • it modifies the string being tokenized, so it doesn't work for literal strings, or is not very useful when you want to keep the string for other purposes. In that case, you must copy the string to a temporary first.
  • it merges adjacent delimiters, so if your string was "a&&b&c", the returned tokens are "a", "b", and "c". Note that there is no empty token after "a".
  • it is not thread-safe.
Alok Singhal
  • 93,253
  • 21
  • 125
  • 158
  • I suppose it also depends on the C implementation. On my system the string itself is not modified when I call strtok(). Actually I do not even see how it could. After all it just has to produce pointers to the start of the different tokens within the string. – Cees Meijer Jan 19 '10 at 10:43
  • Cees Meijer: `strtok()` *does* modify the argument string - the delimiter characters are replaced by '\0', so that the strings returned are properly terminated. – caf Jan 19 '10 at 10:58
  • 1
    `strtok` *has* to modify the string if it has to be standards compliant. From the C standard: *If such a character is found, it is overwritten by a null character, which terminates the current token*. This is very explicit. http://opengroup.org/onlinepubs/009695399/functions/strtok.html. Can you post code where `strtok()` doesn't modify the string? – Alok Singhal Jan 19 '10 at 10:59
  • Oops. I suppose you must be right. My mistake. Indeed changing the delimiter to \0 is the only way it could work. And after closer examination of my code (it's an embedded system so inspecting the disassembled code was not that hard) I see this is exactly what happens. – Cees Meijer Jan 20 '10 at 08:05
  • I don't understand how this is going to split the string by `&`. For example, how could you get the first token, `this`, by itself. – nikk wong Jan 30 '19 at 10:33
2

You can use the strok() function as shown in the example below.

/// Function to parse a string in separate tokens 

int parse_string(char pInputString[MAX_STRING_LENGTH],char *Delimiter,
                   char *pToken[MAX_TOKENS])
{
  int i;
  i = 0;

  pToken[i] = strtok(pInputString, Delimiter);
  i++;

  while ((pToken[i] = strtok(NULL, Delimiter)) != NULL){
     i++;
  }
  return i;
}

/// The array pTokens[] now contains the pointers to the start of each token in the (unchanged) original string.

sprintf(String,"Token1&Token2");
NrOfParameters = parse_string(String,"&",pTokens);

sprintf("%s, %s",pToken[0],pToken[1]);
Cees Meijer
  • 742
  • 2
  • 8
  • 23
0

For me, using strtok() function is unintuitive and too complicated, so I managed to create my own one. As arguments it accepts a string to split, character which determinates spaces between tokens and pointer representing number of found tokens (useful when printing these tokens in loop). A disadvantage of this function is fixed maximum lenght of each token.

#include <stdlib.h>
#include <string.h>
#define MAX_WORD_LEN 32


char **txtspt(const char *text, char split_char, int *w_count)
{
    if(strlen(text) <= 1) 
        return NULL;

    char **cpy0 = NULL;
    int i, j = 0, k = 0, words = 1;

    //Words counting
    for(i = 0; i < strlen(text); ++i)
    {
        if(text[i] == split_char && text[i + 1] != '\0')
        {
            ++words;
        }
    }
    //Memory reservation
    cpy0 = (char **) malloc(strlen(text) * words);
    for(i = 0; i < words; ++i)
    {
        cpy0[i] = (char *) malloc(MAX_WORD_LEN);
    }

    //Splitting
    for(i = 0; i < strlen(text) + 1; ++i)
    {
       if(text[i] == split_char)
       {
           cpy0[k++][j] = '\0';
           j = 0;
       }
       else
       {
           if(text[i] != '\n')           //Helpful, when using fgets() 
                cpy0[k][j++] = text[i];  //function
       }

    }

    *w_count = words;
    return cpy0;
}