-1

I have a file with a series of words separated by a white space. For example file.txt contains this: "this is the file". How can I use fscanf to take word by word and put each word in an array of strings?

Then I did this but I don't know if it's correct:

char *words[100];
int i=0;
while(!feof(file)){
        fscanf(file, "%s", words[i]);
        i++;
        fscanf(file, " ");
}
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278

5 Answers5

2

When reading repeated input, you control the input loop with the input function itself (fscanf in your case). While you can also loop continually (e.g. for (;;) { ... }) and check independently whether the return is EOF, whether a matching failure occurred, or whether the return matches the number of conversion specifiers (success), in your case simply checking that the return matches the single "%s" conversion specifier is fine (e.g. that the return is 1).

Storing each word in an array, you have several options. The most simple is using a 2D array of char with automatic storage. Since the longest non-medical word in the Unabridged Dictionary is 29-characters (requiring a total of 30-characters with the nul-terminating character), a 2D array with a fixed number of rows and fixed number of columns of at least 30 is fine. (dynamically allocating allows you to read and allocate memory for as many words as may be required -- but that is left for later.)

So to set up storage for 128 words, you could do something similar to the following:

#include <stdio.h>

#define MAXW  32    /* if you need a constant, #define one (or more) */
#define MAXA 128

int main (int argc, char **argv) {

    char array[MAXA][MAXW] = {{""}};    /* array to store up to 128 words */
    size_t n = 0;                       /* word index */

Now simply open your filename provided as the first argument to the program (or read from stdin by default if no argument is given), and then validate that your file is open for reading, e.g.

    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

Now to the crux of your read-loop. Simply loop checking the return of fscanf to determine success/failure of the read, adding words to your array and incrementing your index on each successful read. You must also include in your loop-control a check of your index against your array bounds to ensure you do not attempt to write more words to your array than it can hold, e.g.

    while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
        n++;

That's it, now just close the file and use your words stored in your array as needed. For example just printing the stored words you could do:

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)
        printf ("array[%3zu] : %s\n", i, array[i]);

    return 0;
}

Now just compile it, With Warnings Enabled (e.g. -Wall -Wextra -pedantic for gcc/clang, or /W3 on (VS, cl.exe) and then test on your file. The full code is:

#include <stdio.h>

#define MAXW  32    /* if you need a constant, #define one (or more) */
#define MAXA 128

int main (int argc, char **argv) {

    char array[MAXA][MAXW] = {{""}};    /* array to store up to 128 words */
    size_t n = 0;                       /* word index */
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    while (n < MAXA && fscanf (fp, "%s", array[n]) == 1)
        n++;

    if (fp != stdin) fclose (fp);   /* close file if not stdin */

    for (size_t i = 0; i < n; i++)
        printf ("array[%3zu] : %s\n", i, array[i]);

    return 0;
}

Example Input File

$ cat dat/thefile.txt
this is the file

Example Use/Output

$ ./bin/fscanfsimple dat/thefile.txt
array[  0] : this
array[  1] : is
array[  2] : the
array[  3] : file

Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
1

As mentioned in comments, using feof() does not work as would be expected. And, as described in this answer unless the content of the file is formatted with very predictable content, using any of the scanf family to parse out the words is overly complicated. I do not recommend using it for that purpose.

There are many other, better ways to read content of a file, word by word. My preference is to read each line into a buffer, then parse the buffer to extract the words. This requires determining those characters that may be in the file, but would not be considered part of a word. Characters such as \n,\t, (space), -, etc. should be considered delimiters, and can be used to extract the words. The following is a recipe for extracting words from a file: (example code for a few of the items is included below these steps.)

  1. Read file to count words, and get the length of the longest word.
  2. Use count, and longest values from 1st step to allocate memory for words.
  3. Rewind the file.
  4. Read file line by line into a line buffer using while(fgets(line, size, fp))
  5. Parse each new line into words using delimiters and store each word into arrays of step 2.
  6. Use resulting array of words as necessary.
  7. free all memory allocated when finished with arrays

Some example of code to do some of these tasks:

// Get count of words, and longest word in file
int longestWord(char *file, int *nWords)
{
    FILE *fp=0;
    int cnt=0, longest=0, numWords=0;
    int c;
    fp = fopen(file, "r");
    if(fp)
    {

     // if((strlen(buf) > 0) && (buf[0] != '\t') && (buf[0] != '\n') && (buf[0] != '\0')&& (buf[0] > 0))

        while ( (c = fgetc(fp) ) != EOF )
        {
            if ( isalnum (c) ) cnt++;
            else if ( ( ispunct (c) ) || ( isspace(c) ) || (c == '\0' ))
            {
                (cnt > longest) ? (longest = cnt, cnt=0) : (cnt=0);
                numWords++;
            }
        }
        *nWords = numWords;
        fclose(fp);
    }
    else return -1;

    return longest;
}

// Create indexable memory for word arrays
char ** Create2DStr(ssize_t numStrings, ssize_t maxStrLen)
{
    int i;
    char **a = {0};
    a = calloc(numStrings, sizeof(char *));
    for(i=0;i<numStrings; i++)
    {
      a[i] = calloc(maxStrLen + 1, 1);
    }
    return a;
} 

Usage: For a file with 25 words, the longest being 80 bytes:

char **strArray = Create2DStr(25, 80+1);//creates 25 array locations
                                        //each 80+1 characters long
                                        //(+1 is room for null terminator.)
ryyker
  • 22,849
  • 3
  • 43
  • 87
1

strtok() might be a function that can help you here.

If you know that the words will be separated by whitespace, then calling strtok will return the char pointer to the start of the next word.

Sample code from https://www.systutorials.com/docs/linux/man/3p-strtok/

#include <string.h>
...
char *token;
char *line = "LINE TO BE SEPARATED";
char *search = " ";


/* Token will point to "LINE". */
token = strtok(line, search);


/* Token will point to "TO". */
token = strtok(NULL, search);

In your case, the space character would also act as a delimiter in the line. Note that strtok might modify the string passed in, so if you need to you should make a deep copy using something like malloc.

It might also be easier to use fread() to read a block from a file

Steven
  • 46
  • 4
0
int i=0;
char words[50][50];
while(fscanf(file, " %s ", words[i]) != EOF)
    i++;

I wouldn't entirely recommend doing it this way, because of the unknown amount of words in the file, and the unknown length of a "word". Either can be over the size of '50'. Just do it dynamically, instead. Still, this should show you how it works.

  • When you tried to implement this, were there whitespace chars? – Aaron Derby Aug 28 '19 at 19:26
  • 1
    Note [Trailing white space in a `scanf()` format string is a UI disaster](https://stackoverflow.com/questions/19499060/what-is-difference-between-scanfd-and-scanfd). If the input is coming from a file, it isn't so bad, but from a terminal, it is appalling. The leading white space is unnecessary (`%s` skips white space anyway) but harmless. – Jonathan Leffler Aug 28 '19 at 21:58
0

How can I use fscanf to take word by word and put each word in an array of strings?

Read each word twice: first to find length via "%n". 2nd time, save it. (Inefficient yet simple)

Re-size strings as you go. Again inefficient, yet simple.

// Rough untested sample code - still need to add error checking.

size_t string_count = 0;
char **strings = NULL;

for (;;) {
  long pos = ftell(file);
  int n = 0;
  fscanf(file, "%*s%n", &n);  // record where scanning a "word" stopped
  if (n == 0) break;
  fseek(file, pos, SEEK_SET); // go back;
  strings = realloc(strings, sizeof *strings * (string_count+1));// increase array size
  strings[string_count] = malloc(n + 1u);  // Get enough memory for the word
  fscanf(file, "%s ", strings[string_count] );  // read/save word
}

// use strings[], string_count

// When done, free each strings[] and then strings
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256