-1

Let's say we have a string of words that are delimited by a comma. I want to write a code in C to store these words in a variable.

Example

amazon, google, facebook, twitter, salesforce, sfb

We do not know how many words are present.

If I were to do this in C, I thought I need to do 2 iterations. First iteration, I count how many words are present. Then, in the next iteration, I store each words.

Step 1: 1st loop -- count number of words
....
....
//End 1st loop. num_words is set. 

Step 2:
// Do malloc using num_words.
char **array = (char**)malloc(num_words* sizeof(char*));

Step 3: 2nd loop -- Store each word. 
// First, walk until the delimiter and determine the length of the word
// Once len_word is determined, do malloc
*array= (char*)malloc(len_word * sizeof(char));
// And then store the word to it

// Do this for all words and then the 2nd loop terminates

Can this be done more efficiently? I do not like having 2 loops. I think there must be a way to do it in 1 loop with just basic pointers.

The only restriction is that this needs to be done in C (due to constraints that are not in my control)

leopoodle
  • 2,110
  • 7
  • 24
  • 36
  • Note: `sizeof(void)` is invalid, because `void` is an incomplete type. Your probably meant `sizeof(char *)` or `sizeof(*array)`. – John Bollinger Nov 10 '18 at 19:22
  • 1
    without counting words, you may use "malloc" for the first word and then increase the size of double array by using "realloc" for every new word – mangusta Nov 10 '18 at 19:25

3 Answers3

1

You don't need to do a separate pass to count the words. You can use realloc to enlarge the array on the fly as you read in the data on a single pass.

To parse an input line buffer, you can use strtok to tokenize the individual words.

When saving the parsed words into the word list array, you can use strdup to create a copy of the tokenized word. This is necessary for the word to persist. That is, whatever you were pointing to in the line buffer on the first line will get clobbered when you read the second line (and so on ...)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>

char **words;
size_t wordmax;
size_t wordcount;

int
main(int argc,char **argv)
{
    char *cp;
    char *bp;
    FILE *fi;
    char buf[5000];

    --argc;
    ++argv;

    // get input file name
    cp = *argv;
    if (cp == NULL) {
        printf("no file specified\n");
        exit(1);
    }

    // open input file
    fi = fopen(cp,"r");
    if (fi == NULL) {
        printf("unable to open file '%s' -- %s\n",cp,strerror(errno));
        exit(1);
    }

    while (1) {
        // read in next line -- bug out if EOF
        cp = fgets(buf,sizeof(buf),fi);
        if (cp == NULL)
            break;

        bp = buf;
        while (1) {
            // tokenize the word
            cp = strtok(bp," \t,\n");
            if (cp == NULL)
                break;
            bp = NULL;

            // expand the space allocated for the word list [if necessary]
            if (wordcount >= wordmax) {
                // this is an expensive operation so don't do it too often
                wordmax += 100;

                words = realloc(words,(wordmax + 1) * sizeof(char *));
                if (words == NULL) {
                    printf("out of memory\n");
                    exit(1);
                }
            }

            // get a persistent copy of the word text
            cp = strdup(cp);
            if (cp == NULL) {
                printf("out of memory\n");
                exit(1);
            }

            // save the word into the word array
            words[wordcount++] = cp;
        }
    }

    // close the input file
    fclose(fi);

    // add a null terminator
    words[wordcount] = NULL;

    // trim the array to exactly what we need/used
    words = realloc(words,(wordcount + 1) * sizeof(char *));

    // NOTE: because we added the terminator, _either_ of these loops will
    // print the word list
#if 1
    for (size_t idx = 0;  idx < wordcount;  ++idx)
        printf("%s\n",words[idx]);
#else
    for (char **word = words;  *word != NULL;  ++word)
        printf("%s\n",*word);
#endif

    return 0;
}
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
0

What you're looking for is http://manpagesfr.free.fr/man/man3/strtok.3.html

(From man page)

The strtok() function parses a string into a sequence of tokens. On the first call to strtok() the string to be parsed should be specified in str. In each subsequent call that should parse the same string, str should be NULL.

But this thread look like duplicate of Split string with delimiters in C Unless you are forced to produce your own implementation ...

Asya Corbeau
  • 209
  • 1
  • 4
  • This does not seem responsive to the question, which is about an approach that avoids separate loops for counting strings and recording the split results. Yes, `strtok` could be used in such a pursuit, but it is not the key here. – John Bollinger Nov 10 '18 at 19:28
0

We do not know how many words are present.

We know num_words <= strlen(string) + 1. Only 1 "loop" needed. The cheat here is a quick run down s via strlen().

// *alloc() out-of-memory checking omitted for brevity
char **parse_csv(const char *s) {
  size_t slen = strlen(s);
  size_t num_words = 0;
  char **words = malloc(sizeof *words * (slen + 1));

  // find, allocate, copy the words
  while (*s) {
    size_t len = strcspn(s, ",");
    words[num_words] = malloc(len + 1);
    memcpy(words[num_words], s, len);
    words[num_words][len] = '\0';
    num_words++;
    s += len;    // skip word
    if (*s) s++; // skip ,
  }

  // Only 1 realloc() needed.
  realloc(words, sizeof *words *num_words);  // right-size words list
  return words;
}

It makes send to NULL terminate the list, so

  char **words = malloc(sizeof *words * (slen + 1 + 1));
  ...
  words[num_words++] = NULL;
  realloc(words, sizeof *words *num_words);
  return words;

In considering the worst case for the initial char **words = malloc(...);, I take a string like ",,," with its 3 ',' would make for 4 words "", "", "", "". Adjust code as needed for such pathological cases.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256