0

I'm trying to tokenize a sting and here is my attempt.

char new_str[1024];
void tokenize_init(const char str[]){//copy the string into global section
  strcpy(new_str,str);
}

int i = 0;
char *tokenize_next() {
  const int len = strlen(new_str);
  for(; i <= len; i++) {
  if ( i == len) {
  return NULL;
  }
  if ((new_str[i] >= 'a' && new_str[i] <= 'z') ||
   (new_str[i] >= 'A' && new_str[i] <= 'Z')) {
   continue;
   }else { 
   new_str[i] = '\0'; 
   i = i + 1;
   return new_str;
   }
 }
  return NULL;
}

//main function
int main(void) {
  char sentence[] = "This is a good-sentence for_testing 1 neat function.";
  printf("%s\n", sentence);
  tokenize_init(sentence);
  for (char *nt = tokenize_next(); 
   nt != NULL; 
   nt = tokenize_next())
printf("%s\n",nt);
}

However, it just print out the first word of the sentence(which is "This") and then stop. Can someone tell me why? My guess is my new_str is not persisent and when the main function recall tokenize_next() the new_str become just the first word of the sentence. Thanks in advance.

OKC
  • 181
  • 1
  • 4
  • 13
  • 1
    Is there any specific reason why you aren't just using `strtok()` to tokenize the string? – Timo Geusch Jul 15 '13 at 18:35
  • http://www.elook.org/programming/c/strtok.html may help –  Jul 15 '13 at 18:35
  • 1
    Isn't `strsep` the new hotness? – Carl Norum Jul 15 '13 at 18:37
  • This is because `strtok()` replace delimiters with `\0` nul symbols, Read here how `strtok()` works: [C `strtok()` split string into tokens but keep old data unaltered](http://stackoverflow.com/questions/17104953/c-strtok-split-string-into-tokens-but-keep-old-data-unaltered/17104999#17104999), In string `sentence` `strtok()` puts `\0` after `"This"`. – Grijesh Chauhan Jul 15 '13 at 18:46

1 Answers1

1

The reason that it only prints out "This" is because you iterate to the first non-letter character which happens to be a space, and you replace this with a null terminating character at this line:

new_str[i] = '\0'; 

After that, it doesn't matter what you do to the rest of the string, it will only print up to that point. The next time tokenize_next is called the length of the string is no longer what you think it is because it is only counting the word "This" and since "i" has already reached that amount the function returns and so does every successive call to it:

if ( i == len) 
{
  return NULL;
}

To fix the function you would need to somehow update your pointer to look past that character on the next iteration.

However, this is quite kludgy. You are much better off using one of the mentioned functions such as strtok or strsep

UPDATE:

If you cannot use those functions then a redesign of your function would be ideal, however, per your request, try the following modifications:

#include <string.h>
#include <cstdio>

char new_str[1024];
char* str_accessor;

void tokenize_init(const char str[]){//copy the string into global section
   strcpy(new_str,str);
   str_accessor = new_str;
}

int i = 0;

char* tokenize_next(void) {
   const int len = strlen(str_accessor);

   for(i = 0; i <= len; i++) {

      if ( i == len) {
         return NULL;
      }

      if ((str_accessor[i] >= 'a' && str_accessor[i] <= 'z') ||
      (str_accessor[i] >= 'A' && str_accessor[i] <= 'Z')) {
         continue;
      }
      else { 
         str_accessor[i] = '\0';

         char* output = str_accessor;
         str_accessor = str_accessor + i + 1;

         if (strlen(output) <= 0)
         {
            str_accessor++; 
            continue;
         }

         return output;
      }
   }
   return NULL;
}

//main function
int main(void) {

   char sentence[] = "This is a good-sentence for_testing 1 neater function.";
   printf("%s\n", sentence);

   tokenize_init(sentence);
   for (char *nt = tokenize_next(); nt != NULL; nt = tokenize_next())
         printf("%s\n",nt);
}
dtmland
  • 2,136
  • 4
  • 22
  • 45
  • This is an exercise on the book and the restriction is I cannot use something like strtok and strsep. Do you have any idea to implement these functions in a better way? – OKC Jul 15 '13 at 19:17
  • and you say I should "update your pointer to look past that character on the next iteration". Can you explain how? Thank you. – OKC Jul 15 '13 at 19:20
  • @OKC Answer update. If this satisfies the requirement don't forgot to select the green check mark. Thanks – dtmland Jul 15 '13 at 19:35
  • Thanks for your help first. But I have another question, for the line "new_str = new_str + i + 1;" I got a error message says "error: array type 'char [1024]' is not assignable". What does it mean? – OKC Jul 15 '13 at 19:53
  • It still has a little problems. There are two newlines between "testing" and "neat". Do you know why? – OKC Jul 15 '13 at 20:20
  • Thanks for helping again and it works perfectly. But I'm wondering what does "str_accessor = str_accessor + i + 1;" mean. It seems that you are trying to add a value to a pointer, isn't it? Can you explain this to me? – OKC Jul 16 '13 at 03:43
  • I notice there is a bug in your code. What if you take out the period in the end of the sentence, it wont print out "function". – OKC Jul 17 '13 at 05:46
  • @OKC If I show you how to fix that, will you upvote my answer? – dtmland Jul 17 '13 at 14:56