0

I have a file named

"grwords.txt"

How can I assign the file to an array and print a random word from it? My text file contains around 540000 words. This is my code:

int main(int argc, char *argv[]) {
    FILE *file;
    int i, random;
    char words[540000][25];
    file = fopen("grwords.txt", "r");
    if (file == NULL){
        printf("The file cannot be opened.\n");
        return 1;
    }
    random = rand();
    fprintf(file, "%lf\n", random);
    fclose(file);
    return 0;
}
samus
  • 6,102
  • 6
  • 31
  • 69
user3601507
  • 173
  • 1
  • 2
  • 8
  • That code is rather, if you pardon the pun, random. – unwind May 08 '14 at 14:15
  • fprintf is writting to your file at a random location, not reading from it. – samus May 08 '14 at 14:16
  • Yes, I haven't finished it yet. I just posted what I had. I was actually trying to print a random word from a file without assigning the file to an array but alas it doesn't work. – user3601507 May 08 '14 at 14:17
  • 1
    Your code will not go farther than `char words[540000][25];"`. You can't allocate ~12MB on the stack. – edmz May 08 '14 at 14:17
  • Maybe you don't want to read the whole file, count the words in it, and then pick one -- you could do it by sequentially reading the file word for word and then applying this algorithm which works with sequences of unknown length, reading them just once: http://propersubset.com/2010/04/choosing-random-elements.html – Peter - Reinstate Monica May 08 '14 at 14:28
  • If you want an algorithm where you only need to store one line at a time look at the second answer to this question (code is in C#, so you need to change it to C): http://stackoverflow.com/questions/3745934/read-random-line-from-a-file-c-sharp/3745973#3745973 – Klas Lindbäck May 08 '14 at 14:28
  • @KlasLindbäck :-) I thought it is an awesome algorithm. – Peter - Reinstate Monica May 08 '14 at 14:30
  • @black - Not sure where you concluded stack is limited to less than 12Mb. ***[That statement in and of itself is not true](http://stackoverflow.com/a/1034081/645128)***. (32 bit windows allows 2Gb, 64 bit windows allows 4Gb) – ryyker May 08 '14 at 15:41
  • @ryyker Yes, but you must specify that. And if I forget to do that, I'll run into a stack overflow. Dynamic memory allocation was invented for that purpose. – edmz May 08 '14 at 18:41
  • @black - I agree with your last statement: _but you must_ ***specify*** _that_. My issue was with your previous statement: _You can't allocate ~12MB on the stack_. – ryyker May 08 '14 at 19:38

4 Answers4

0
void ReadFile(char *name)
{
    FILE *file;
    char *buffer;
    unsigned long fileLen;

    //Open file
    file = fopen(name, "rb");
    if (!file)
    {
        fprintf(stderr, "Unable to open file %s", name);
        return;
    }

    //Get file length
    fseek(file, 0, SEEK_END);
    fileLen=ftell(file);
    fseek(file, 0, SEEK_SET);

    //Allocate memory
    buffer=(char *)malloc(fileLen+1);
    if (!buffer)
    {
        fprintf(stderr, "Memory error!");
                                fclose(file);
        return;
    }

    //Read file contents into buffer
    fread(buffer, fileLen, 1, file);
    fclose(file);

    //Do what ever with buffer

    free(buffer);
}
samus
  • 6,102
  • 6
  • 31
  • 69
  • I thought you shouldn't cast the result of `malloc`, see: http://stackoverflow.com/a/605858/3488231 – user12205 May 08 '14 at 14:26
  • 1
    @ace opinions differ there ... C++ folks (including me) like the cast, and I'm not convinced by any of unwind's arguments, with all due respect ;-). – Peter - Reinstate Monica May 08 '14 at 14:31
  • @SamusArin - Someone went through and did a drive by down vote for no apparent reason. There is nothing that I can see wrong with your answer. For what it is worth, +1. My answer had not been posted yet, I was still editing. – ryyker May 08 '14 at 15:58
  • welcome, I hate drive by down votes too. – ryyker May 08 '14 at 16:38
  • I'm going to keep an eye out for them now, and fix where appropriate. – samus May 08 '14 at 16:40
0

Open the file. Determine its size. Find a random file offset between 0 and size and fseek to it. Scan a word and discard it, because we might have jumped into the middle of a word and would then printing only part of it. Scan the next word and print it. Close the file.

Of course we may have jumped to a position close to the end of the file, so we need to take care of reaching the end of the file when reading the word by rewinding the file. As a corner case, the file may not have any words in it.

If your file size ig greater than RAND_MAX, you must find a way to compose a random number greater than that, e.g.:

int pos = (rand() * RAND_MAX + rand()) % size

File seeking is oviously not a good method if you need to print random words from the file repeatedly.

M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • @ryyker: Thanks. Good thing I was wearing my downvote-proof vest. But maybe the downvote was justified, because now that I look at it, I see the flaw in my idea: The words are not chosen with equal probability. The word after "internationalisation" will get picked ten times as often as the word after "if". – M Oehm May 08 '14 at 16:36
  • Someone went through and did a drive by down vote for no apparent reason. There is nothing that I can see wrong with your answer. ***Addressing the limitations of*** `rand()` is worth at least +1. (My answer had not been posted yet, I was still editing.) – ryyker May 08 '14 at 16:37
  • Lol. Yes, maybe, but I suspected shenanigans when one second there were no down votes, and the next, every answer posted had a down vote. No comments were left to explain why. – ryyker May 08 '14 at 17:52
0

How can I assign the file to an array and print a random word from it?

A) read the file once to get parameters: 1: number of words, 2: length of longest word (used for memory alloc later)
B) Allocate memory for an array of strings (eg: char **strings;) to read file into.
(note: you have choosen to use char words[540000][25];, which will also work, but is not flexible)
C) Using fopen() and fgets() read each word into array strings.
D) Use srand() and rand() to produce pseudo random number from 0 to numWords.
(Note: rand() by itself only produces numbers from 0 to RAND_MAX (32767). If bigger number needed, adapt to that by using combinations of rand() to produce a bigger number.
E) Use printf("Random word is %s", strings[randNum]); to print random number.

Your code segment is a start, but you are missing a few key elements. One of which is shown here:

At this point in your code:

random = rand();
fprintf(file, "%lf\n", random);
fclose(file);  

You still have not read the words from the opened file into a string array words, as you stated you wanted to. That should be done before this last section. Something like this should work:

    #define DELIM "- .,:;//_*&\n"  //or use char DELIM[]="- .,:;//_*&\n"

    //...other code

    char *buf;
    char line[260];
    int cnt=0;

   while(fgets(line, 260, file))
    {
        buf = strtok(line, DELIM);
        while(buf)
        {
            if((strlen(buf) > 0) && (buf[0] != '\t') && (buf[0] != '\n') && (buf[0] != '\0')&& (buf[0] > 0))
            {
                strcpy(words[cnt++], buf);
            }
            buf = strtok(NULL, DELIM);
        }
    }
    //... other code
ryyker
  • 22,849
  • 3
  • 43
  • 87
0
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[]) {
    static char words[540000][25];
    FILE *file;
    int i, j;
    size_t cnt, n;
    char word[25];

    srand(time(NULL));
    file = fopen("grwords.txt", "r");
    if (file == NULL){
        printf("The file cannot be opened.\n");
        return 1;
    }
    cnt = 0;
    while(1==fscanf(file, "%24s", word)){
        if(cnt == 540000)
            break;
        strcpy(words[cnt++], word);
    }
    fclose(file);
    n = cnt;
    if(n > RAND_MAX){
        int part;
        size_t random = 0;
        i = n / RAND_MAX;
        part = rand() % (i+1);
        if(part == i){
            j = n % RAND_MAX;
            if(j != 0)
                random = random + RAND_MAX*part + rand() % j;
            else
                random = random + RAND_MAX*(part-1) + rand();
        } else {
            random = random + RAND_MAX*part + rand();
        }
        printf("%s\n", words[random]);
    } else {
        int random = rand() % n;
        printf("%s\n", words[random]);
    }
    return 0;
}
BLUEPIXY
  • 39,699
  • 7
  • 33
  • 70
  • Someone went through and did a drive by down vote for no apparent reason. There is nothing that I can see wrong with your answer. For what it is worth, +1. My answer had not been posted yet, I was still editing. – ryyker May 08 '14 at 16:02
  • This works great. But isn't there a shorter way to do it? I'm not complaining or anything but I remember that our professor said that it should take around 3-4 lines to do. – user3601507 May 10 '14 at 18:17
  • @user3601507 I think to be able to be put together a little shorter for sure. But I think so that there is no less effective. I think that it is necessary to leave the thought process than that. – BLUEPIXY May 11 '14 at 07:47
  • Yeah I was trying so hard to find a way to make it shorter but I'm not familiar with C enough. – user3601507 May 11 '14 at 13:15