I was wondering we would go about splitting strings into tokens or any other efficient ways of doing this.
i.e. I have...
char string1[] = "hello\tfriend\n";
How would I get "hello" and "friend" in their own separate variables?
I was wondering we would go about splitting strings into tokens or any other efficient ways of doing this.
i.e. I have...
char string1[] = "hello\tfriend\n";
How would I get "hello" and "friend" in their own separate variables?
Here is a very simple example splitting your string into parts saved in an array of character arrays using a start and end pointer. The MAXL
and MAXW
defines simply are a convenient way to define constants that are used to limit the individual word length to 32 (31 chars + null terminator) and a maximum of 3 words (parts) of the original string:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXL 32
#define MAXW 3
int main (void) {
char string1[] = "hello\tfriend\n";
char *sp = string1; /* start pointer */
char *ep = string1; /* end pointer */
unsigned c = 0; /* temp character */
unsigned idx = 0; /* index for part */
char strings[MAXW][MAXL] = {{0}}; /* array to hold parts */
while (*ep) /* for each char in string1 */
{
if (*ep == '\t' || *ep == '\n') /* test if \t or \n */
{
c = *ep; /* save character */
*ep = 0; /* replace with null-termator */
strcpy (strings[idx], sp); /* copy part to strings array */
*ep = c; /* replace w/original character */
idx++; /* increment index */
sp = ep + 1; /* set start pointer */
}
ep++; /* advance to next char */
}
printf ("\nOriginal string1 : %s\n", string1);
unsigned i = 0;
for (i = 0; i < idx; i++)
printf (" strings[%u] : %s\n", i, strings[i]);
return 0;
}
Output
$ ./bin/split_hello
Original string1 : hello friend
strings[0] : hello
strings[1] : friend
Using strtok
simply replaces the manual pointer logic with the function call to split the string.
Updated Line-end Handling Example
As you have found, when stepping though the string you can create as simple an example as you need to fit the current string, but with a little extra effort you can expand your code to handle a broader range of situations. In your comment you noted that the above code does not handle the situation where there is no newline
at the end of the string. Rather than changing the code to handle just that situation, with a bit of thought, you can improve the code so it handles both situations. One approach would be:
while (*ep) /* for each char in string1 */
{
if (*ep == '\t' || *ep == '\n') /* test if \t or \n */
{
c = *ep; /* save character */
*ep = 0; /* replace with null-termator */
strcpy (strings[idx], sp); /* copy part to strings array */
*ep = c; /* replace w/original character */
idx++; /* increment index */
sp = ep + 1; /* set start pointer */
}
else if (!*(ep + 1)) { /* check if next is ending */
strcpy (strings[idx], sp); /* handle no ending '\n' */
idx++;
}
ep++; /* advance to next char */
}
Break on Any Format/Non-Print Character
Continuing to broaden characters that can be used to separate the strings, rather than using discrete values to identify which characters divide the words, you can use a range of ASCII values to identify all non-printing or format characters as separators. A slightly different approach can be used:
char string1[] = "\n\nhello\t\tmy\tfriend\tagain\n\n";
char *p = string1; /* pointer to char */
unsigned idx = 0; /* index for part */
unsigned i = 0; /* generic counter */
char strings[MAXW][MAXL] = {{0}}; /* array to hold parts */
while (*p) /* for each char in string1 */
{
if (idx == MAXW) { /* test MAXW not exceeded */
fprintf (stderr, "error: MAXW (%d) words in string exceeded.\n", MAXW);
break;
}
/* skip each non-print/format char */
while (*p && (*p < ' ' || *p > '~'))
p++;
if (!*p) break; /* if end of s, break */
while (*p >= ' ' && *p <= '~') /* for each printable char */
{
strings[idx][i] = *p++; /* copy to strings array */
i++; /* advance to next position */
}
strings[idx][i] = 0; /* null-terminate strings */
idx++; /* next index in strings */
i = 0; /* start at beginning char */
}
This will handle your test string regardless of line ending and regardless of the number of tabs or newlines included. Take a look at ASCII Table and Description as a reference for the character ranges used.