0

I'm trying to add characters to a string one by one. I have something like this:

void doline(char *line, char *buffer, char** tokens){
}

and i am calling it like:

char *line = malloc(1025 * sizeof(char *));
fgets(line, 1024, stdin);
int linelength = strlen(line);
if (line[linelength - 1] == '\n'){
    line[linelength - 1] = '\0';
}

char ** tokens = (char **) malloc(strlen(line) * sizeof(char *));
char *emptybuffer = malloc(strlen(line) * sizeof(char *));

parseline(line, emptybuffer, tokens);

So doline will go through line and tokenize it based on various conditions and place fragments of it into tokens. I am building the temp string in the variable buffer To do this, I need to go through line character by character.

I am currently doing:

buffer[strlen(buffer)] = line[i];

And then at the end of the loop:

*buffer++ = '\0';

But this is the result:

printf("Working on line: '%s' %d\n", line, strlen(line));

Outputs: Working on line: 'test' 4

But by the end of the function the buffer is:

*buffer++ = '\0';
printf("Buffer at the very end: '%s' %d\n", buffer, strlen(buffer));

Outputs: Buffer at the very end: 'test' 7

So the output is showing that the string is getting messed up. What's the best way to build this string character by character? Are my string manipulations correct?

Any help would be much appreciated!

Thanks!

Rila
  • 197
  • 1
  • 2
  • 7
  • 1
    `strlen()` is only going to work if the terminating `\0` is present; trying to use it to add a character to an unterminated string (or to add the terminator!) is going to walk off the end. – geekosaur Apr 25 '12 at 02:51
  • Ah, that makes sense sir. What's the best way to do this then, if I can't utilize strlen as the index? – Rila Apr 25 '12 at 02:53
  • 1
    I could try keeping an internal count of what index I'm on. Let me try this out! – Rila Apr 25 '12 at 02:53
  • Is using *buffer++ = '\0'; alright even when the string isn't terminated? – Rila Apr 25 '12 at 02:56
  • Note: your line buffer is bigger than you expect. You should be calculating your buffer size using `sizeof(char)` rather than `sizeof(char*)`. – Michael Anderson Apr 25 '12 at 02:57
  • 1
    ...note that `sizeof (char)` is guaranteed to be 1. – geekosaur Apr 25 '12 at 02:57
  • @Rila, it's valid if you've been keeping `buffer` updated appropriately. `strlen()` depends on the `\0`, so it's easy to note that it won't work without the `\0`; for your code using `buffer` we would have to see the actual code to be certain. – geekosaur Apr 25 '12 at 02:59
  • Good point, I've adjusted the code to char *emptybuffer = malloc(1024 * sizeof(char)); But I have a very strange problem: When I do for(; i < strlen(line); ++i){ printf("On char: '%c'\n Buffer is '%s'", line[i], buffer); I'm getting output like: Buffer is '≈' On char 't'. t is the first character, and the buffer is coming straight from malloc, it looks like it's already coming into the function corrupted? – Rila Apr 25 '12 at 03:04
  • Everytime I update buffer, I haven't been adding the null terminator. I'm only adding that at the very end. I've changed my code to use an internal index count I'm keeping. I've updated my original question to show the actual code – Rila Apr 25 '12 at 03:06
  • `buffer` has to be null-terminated too for `printf` to work correctly with the `%s` specifier (i.e. C-style strings). – dirkgently Apr 25 '12 at 03:13
  • Also, note that `fgets` reads at most one character less than the number of characters specified as its second argument. So, your `line` object needs to be `malloc`ed to only `1024` if that is the limit you want to place on the buffer size. `fgets` properly null terminates the resulting buffer, if no error occurs. – dirkgently Apr 25 '12 at 03:17
  • So it looks like I should adjust the code to add null terminators, and then overwrite the terminators when I add new chars? – Rila Apr 25 '12 at 03:20
  • I'm using the following to call fgets: char *line = malloc(1025 * sizeof(char *)); fgets(line, 1024, stdin); int linelength = strlen(line); if (line[linelength - 1] == '\n'){ line[linelength - 1] = '\0'; } I can adjust it down from 1025 to 1024 if the extra safety wasn't necessary – Rila Apr 25 '12 at 03:21
  • @Rila: Yes, you should adjust your code whenever you want to deal with strings (think of copying strings, getting the string length and so on). Also, the `parseline` you posted has a memory leak: I assume that the `buffer` passed to it has already been allocated, why are you allocating a fresh piece of memory to it without releasing what it already points to within the `for`-loop? – dirkgently Apr 25 '12 at 03:24
  • I'm not sure how I should pass in buffer into parseline so that it's null terminated. I also need to add correct memory management, you're right – Rila Apr 25 '12 at 03:29
  • I've updated my code to use null terminators. See original post's edit. – Rila Apr 25 '12 at 03:41
  • Guys, I've figured it out. It was the null terminators, I needed to add the null terminator every time I edit the buffer string. Thanks! If one of you would like to submit an answer repeating what you said here, I'd be happy to accept it as the answer :) – Rila Apr 25 '12 at 03:47
  • @Rila would you like to consider my post for an answer?! – Sangeeth Saravanaraj Apr 25 '12 at 07:30

1 Answers1

1

There were some basic problems so I re-written the program.

#include <stdio.h>
#include <stdlib.h>

#define str_len 180

void tokenize(char *str, char **tokens)
{
    int length = 0, index = 0;
    int i = 0;
    int str_i;
    int tok_i;

    while(str[length]) {
        if (str[length] == ' ') {
            /* this charecter is a space, so skip it! */
            length++;
            index++;

            tokens[i] = malloc(sizeof(char) * index);

            tok_i = 0;           
            for (str_i=length-index ; str_i<length; str_i++) {
                tokens[i][tok_i] = str[str_i];
                tok_i++;
            }

            tokens[i][tok_i] = '\0';
            i++;
            index = 0;
        }
        length++;
        index++;
    }       

    /* copy the last word in the string */
    tokens[i] = malloc(sizeof(char) * index);
    tok_i = 0;           
    for (str_i=length-index ; str_i<length; str_i++) {
        tokens[i][tok_i] = str[str_i];
        tok_i++;
    }
    tokens[i][tok_i] = '\0';
    tokens[i++] = NULL;

    return;         
}

int main()
{
    char *str = malloc(str_len * sizeof(char));
    char **tokens = malloc(100 * sizeof(char *));
    int i = 0;

    if (str == NULL || tokens == NULL)
        return 1;

    gets(str);
    printf("input string: %s\n", str);
    tokenize(str, tokens);

    while(tokens[i] != NULL) {
        printf("%d - %s \n", i, tokens[i]);
        i++;
    }

    while(tokens[i])
        free(tokens[i]);
    free(tokens);
    free(str);

    return 0;
}

It is compiled and executed as follows:

$ gcc -ggdb -Wall prog.c 
$ ./a.out 
this is a test string... hello world!! 
input string: this is a test string... hello world!! 
0 - this  
1 - is  
2 - a  
3 - test  
4 - string...  
5 - hello  
6 - world!!  
$ 

There were few basic assumptions:

  1. the length of the incoming string is assumed to a constant. This can be done dynamically - please check this - How to read a line from the console in C?.

  2. The length of the tokens array is also assumed to be a constant. This can also be changed. I will leave that to you to find out how!

Hope this helps!

Community
  • 1
  • 1
Sangeeth Saravanaraj
  • 16,027
  • 21
  • 69
  • 98