How to use a string pointer to access the rest of the string?

Question

I'm trying to write the unique strings in a file to a linked list and increment a count for each duplicate word. I want to use a getNextWord function that returns a pointer to the next word in the file. The problem is that I am very new to c and pointers so I actually don't know what to do in my main method to call getNextWord and how to use this string pointer to actually get access to the string it's pointing at. So how do I use my function to get the string that needs to be a key for a node? Also any other advice would be greatly appreciated and if you see anything wrong with my function or struct please let me know! Thank you very much for your time. Here's my function and struct..

#define MAX_WORD_LEN 256    

struct list {
    int count;
    char string[MAX_WORD_LEN];
    struct list *next;
};

char* getNextWord(FILE* fd) {
    char c;
    char wordBuffer[MAX_WORD_LEN];
    int putChar = 0;

    while((c = fgetc(fd)) != EOF) {
        if(isalnum(c)) break;
    }
    if (c == EOF) return NULL;

    wordBuffer[putChar++] = tolower(c);

    while((c = fgetc(fd)) != EOF) {
        if(isspace(c) || putChar >= MAX_WORD_LEN -1) break;

        if(isalnum(c)) {
            wordBuffer[putChar++] = tolower(c);
        }
    }
    wordBuffer[putChar] = '\0';
    return strdup(wordBuffer);
}

I'd start by passing the `char` buffer *in* to this function, not allocating one here and trusting the caller to free it. You're already pinned to a fixed buffer length with your linked list node definition, so use its buffer directly, or have this function create a list node (which should be dynamic) rather than just the string within. — WhozCraig, Feb 09 '14 at 04:14
My main problem is that i don't know what syntax to call getNextWord in order to actually use the string it's pointing at. But I'll definitely still try to improve my code with the advice. Thanks for posting — Josh, Feb 09 '14 at 04:28

score 0 · Answer 1 · edited May 23 '17 at 12:28

I see a small discrepancy between your list node definition and the function that retrieves words from a file.

In C, you are completely in charge of memory allocation, so you must decide who will allocate the data and who will free them.

Here your file parsing function does the allocation. It means the pointer returned by getNextWord will reference dynamic memory that has to be freed at some point.

At the same time, your node structure holds another memory buffer that is meant to represent the same piece of data.

With your current implementation, you will have to copy the string gotten through getNextWord into your node and then free the string, like so:

char * new_word = getNextWord (file);
strcpy (my_node->string, new_word);
free (new_word );

That is a waste of time and resources : each character is copied twice (once inside getNextWord and another time in strcpy).

To avoid the duplication, you can do basically two things.

1) change the string field to a char * to keep references to getNextWord results

struct list {
    int count;
    char * string;   // <- reference to the string allocated by getNextWord
    struct list *next;
};

// populating node
my_node->string = getNextWord (file);

In that case, each node becomes responsible for eventually freeing the string.

2) keep string as a char buffer and have getNextWord fill it directly

typedef char wordBuffer_t[MAX_WORD_LEN];

struct list {
    int          count;
    wordBuffer_t string; // <- storage for the word inside each node
    struct list *next;
};

char* getNextWord(FILE* fd, wordBuffer_t wordBuffer) {
    // rest of the code does not change

// populating node
getNextWord (file, my_node->string);

In that case, no dynamic allocation is required. On the other hand, each node will need to allocate enough room for the biggest possible string, which will be inefficient in terms of memory consumption.

As an important side note, I advise you to use typedefs systematically when using any kind of array that might be used as a function parameter (here wordBuffer_t is passed to getNextWord).
It will make your code far more readable and save you from this common C pitfall

score 0 · Answer 2 · answered Feb 09 '14 at 04:59

Not the complete answer - but strtok is a good function to use for this kind of thing. You need to load the entire string you want to parse into memory (rather than read from file a character at a time), but after that all your memory is basically allocated. Here is something to get you going:

char myString[BUFLEN];
char *nextWord;
fgets(myString, BUFLEN, stdin); // if you get from file, change this accordingly 

nextWord = strtok(myString, " ;,.:\t\n");  // use whatever delimiters you expect
printf("The first word is %s\n", nextWord); // strtok returns pointer to start of match
strncpy (my_node->string, nextWord, MAX_WORD_LEN-1); // copy the word to your structure
my_node->string[MAX_WORD_LEN-1]='\0';  // in case word was too long - prevent trouble

while((nextWord = strtok(NULL, " .,;:\t\n")) != NULL) {
  printf("The next word is %s\n", nextWord);
  strncpy (other_node->string, nextWord, MAX_WORD_LEN-1); // copy the word to another node in your structure
  other_node->string[MAX_WORD_LEN-1]='\0';  // in case word was too long - prevent trouble
  // … do whatever else to manage your list
}

This is just to show you how to use strtok to extract words from a string. Interestingly you only call strtok once with the address of the string - after that you use NULL (it actually keeps track of where it is in the string it is "eating"). Note - in the process of chomping through the string, '\0' characters get inserted… This means you can't re-use the string after you have run it through strtok. But otherwise it's a useful function for your purpose.

How to use a string pointer to access the rest of the string?

2 Answers2