0

I have a text file with random words stored in an unstructured way. (Unstructured meaning random spaces and blank lines) - e.g. of the text file:

file.txt

word1 word2              word3 
         word4 
                        word5

     word6 

I'm want to read each of these words into a char array. I tried the following:

FILE *fp 

fp = fopen("file.txt","r")


int numWords =0;
char *arr = malloc(sizeof(char *));
while(!feof(fp)){
    fscanf(fp, "%s", arr);
    numWords++; 
}

fclose(fp);

For some reason, I can't access each word from the array. i.e. I'm expecting printf("%s", arr[0]) to return word1, etc. However, arr[0] stores a character, in this case w.

There is also another problem. I put a printf statement in the while loop and it prints the last word, word6 twice, meaning the loop is executed an extra time at the end for some reason.

If someone could help me on how to achieve this objective that would be much appreciated, thanks!

Programmer
  • 1,266
  • 5
  • 23
  • 44
  • 4
    See: [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/q/5431941/4389800) – P.P Oct 10 '17 at 10:53
  • If you need to store all the words, you need an array of pointers (or arrays). Right now you just have a `char*` (which is also not enough in size). – Ajay Brahmakshatriya Oct 10 '17 at 10:56
  • 1
    In C a string is an array of characters, terminated by a zero. Your `malloc` call basically asks the system to allocate an *array of characters* that you then pass to [`fscanf`](http://en.cppreference.com/w/c/io/fscanf) which will read a "word" into this array and add a terminator. There is a slight problem with this: You only allocate space for a single `char *` (usually 4 or 8 bytes) and then read a word that might be longer into that memory. – Some programmer dude Oct 10 '17 at 10:56
  • As for your problem, what you need is an array of pointers (of type `char **`) and then allocate enough memory for each entry in that array to hold a string. You then pass the pointers in the array to `fscanf` as a destination for the word. – Some programmer dude Oct 10 '17 at 10:57
  • Hi @Someprogrammerdude. Thanks for your response. I changed the array to `char **arr = (char **)malloc (1000 * sizeof (char *));` How do I use fscanf now? – Programmer Oct 10 '17 at 11:09
  • sample [code](https://ideone.com/8sz8m5) – BLUEPIXY Oct 10 '17 at 11:19
  • @BLUEPIXY ah, thanks very much! – Programmer Oct 10 '17 at 11:42
  • @novice, to use Great answer of BLUEPIXY, after write keyboard inputted values `stdin` at last, in order to write `EOF` press: `Ctrl` + `D`. I read it in [How to write `EOF`](https://stackoverflow.com/questions/3061135/can-we-write-an-eof-character-ourselves) – EsmaeelE Oct 10 '17 at 12:01
  • @novice your welcome, but please remove thanks comments – EsmaeelE Oct 10 '17 at 12:06

1 Answers1

0

Your code simply has undefined behavior, so it's impossible to reason about until you remove it.

The allocation allocates room for a single char * pointer, which means typically 8 or 4 bytes. That's all. There's no room to save a lot of word data in there. C won't automatically append to the array or anything like that, you need to deal with the allocation of every byte of storage that you need. When you go ahead and write outside your allocated space, you get the undefined behavior.

To store words like this, you might want to implement a dynamic pointer array. That will deal with storing any number of pointers; the pointers (words) themselves will need to be separately allocated on the heap before being added to the array. This is quite a lot of code.

If you're willing to live with some static limitations (on word length and word count), you can of course do:

char words[1000][30];

That'll give you space for 1000 words of at most 30 characters each. You might want to think about de-duplicating the data, i.e. checking if a word is already stored before storing it again.

unwind
  • 391,730
  • 64
  • 469
  • 606
  • Hi @unwind, how would I implement the loop with fscanf if I have char words[1000][30]? Thanks! – Programmer Oct 10 '17 at 11:18
  • @novice use`char words[1000][100];` seconds dimension 100 shows characters per line, and 1000 shows numbers of lines. you can use [this code](https://ideone.com/LUxzR0) – EsmaeelE Oct 10 '17 at 12:39