I'm writing a program in C that processes a text file and keeps track of each unique word (by using a struct that has a char array for the word and a count for its number of occurrences) and stores this struct into a data structure. However, the assignment has this included: "The entire txt file may be very large and not able to be held in the main memory. Account for this in your program."
I asked him after class, and he said to read the text file by X lines at a time (I think 20,000 was his suggestion?) at a time, analyze them and update the structs, until you've reached the end of the file.
Can anyone help explain the best way to do this and tell me what functions to use? I'm very, very new to C.
(my current program is accurate and correct for small files, I just need to make it accommodate enormous files).
Thank you so much!!
EDIT:
fp = fopen(argv[w], "r");
if ((fp) == NULL){
fprintf( stderr, "Input file %s cannot be opened.\n", argv[w] );
return 2;
}
/* other parts of my program here */
char s[MaxWordSize];
while (fscanf(fp,"%s",s) != EOF){
nonAlphabeticDelete(s); // removes non letter characters
toLowerCase(s); //converts the string to lowercase
//attempts to add to data structure
pthread_mutex_lock(&lock);
add(words, &q, s);
pthread_mutex_unlock(&lock);
}
This works, I just need to adjust it to go X lines at a time through the text file.