-3

I'm trying to look up for the most frequent word in a text. In my program, I input some words some texts. The words and texts are separate by "-----"(I need to search for the most frequent word in my program).

However, I found that when the program search for words in the text. It seemed that it can not run out of the loop(I got time limited exceeded on PC^2). Then, I found that the problem comes from this function(I got Wrong answer error if I annotate this function). Does I misunderstand the usage of scanf or miss some other condition?

void inputTextTxt(void) {

  for (;;) {
   // toss all non-alpha-numerics
   scanf("%*[^a-zA-Z0-9_]");

   int cnt = scanf("%2047[a-zA-Z0-9_]", tmp);
   if (cnt != 1) {
     break; // or return
   }

   for (size_t i = 0; i < dic_actual_num; ++i) {
     if (strcmp(dicWord[i], tmp) == 0) {
       dicWcount[i]++;

     }
   }
  }
}
  • The character which is not digit, alphabet, and '_' should be treated as space
  • The longest length of each line is 1024

Other parts of my code

char tmp[2048];
char **dicWord;
int *dicWcount;
int dic_assume_num = 1, dic_actual_num = 0;

void inputDicTxt() {

    char divider[6] = "-----";
    dicWord = malloc( dic_assume_num * sizeof( char* ));

    for (;;) {

        scanf("%*[^a-zA-Z0-9_-]");
        int cnt_divider = scanf("%2047[-]", tmp);
        int cnt_alphaNumerics = scanf("%2047[a-zA-Z0-9_]", tmp);

        if (cnt_divider != 1 && cnt_alphaNumerics != 1)
            break;

        else if (cnt_divider) {
            if (strcmp(tmp, divider) >= 0) {
                dicWcount = calloc(dic_actual_num,  sizeof(*dicWcount));
                break;
            }
        }
        else if (cnt_alphaNumerics) {
            if (dic_actual_num >= dic_assume_num) {
                dic_assume_num *= 2;
                dicWord = realloc( dicWord, dic_assume_num * sizeof( char* ));
            }
            dicWord[dic_actual_num++] = strdup(tmp);
        }
    }
}


int main() {

    inputDicTxt();
    inputTextTxt();

    int mostNum = 0;

    for (int i = 0; i < dic_actual_num; ++i)
        if (dicWcount[i] > dicWcount[mostNum]) 
            mostNum = i;

    // print out the most frequent word and its number
    printf("%s %d\n", dicWord[mostNum], dicWcount[mostNum]);


    for (int i = 0; i < dic_actual_num; ++i)
        free(dicWord[i]);
    free(dicWord);
    free(dicWcount);

    return 0;
}

EDIT: I've changed from while(feof(!stdin)) to for(;;) in my code, but I still get TLE on the judging system

1 Answers1

0

Do I misunderstand the usage of scanf or miss some other condition?

Be sure to review Why is “while ( !feof (file) )” always wrong? @alk

Yet in this case, code uses while (!feof(stdin)) almost in an acceptable functional fashion. So the issue may lie elsewhere. IAC, avoid while (!feof(stdin)) for its weaknesses and usual IO problem magnetism.


It is not clear why OP's code is in an apparent infinite loop (other than a rare input error or other UB like tmp too small) - even with the unusual code while (!feof(stdin)) {.

Yet below is similar code, that should more cleanly operate/debug.

OP is using while (!feof(stdin)) in a precarious manner that can cause an infinite loop should code loop without reading a character. Example: input error.

Instead of while (!feof(stdin)), check the return value from scanf() that saves data. Do not simply check for boolean-ness like with if (scanf("%2047[a-zA-Z0-9_]", tmp). Check its value.

void inputTextTxt(void) {
  for (;;) {
   // toss all non-alpha-numerics
   scanf("%*[^a-zA-Z0-9_]");

   char tmp[2048]; 
   int cnt = scanf("%2047[a-zA-Z0-9_]", tmp);
   if (cnt != 1) {
     break; // or return
   }
   for (size_t i = 0; i < dic_actual_num; ++i) {
     if (strcmp(dicWord[i], tmp) == 0) {
       dicWcount[i]++;
       // I'd expect a `break;` here as once a match is found, 
       // could another be found?   Why keep looking?
     }
   }
  }
}

Pedantically, scan set ranges like "%[a-z]" are not universally implemented - it is implementation-defined behavior. Highly portable code would require "%[abcedfghijklmnopqrstuvwxyz]". Most system understand "%[a-z]" as hope, so I doubt this is OP's problem.


From a style point-of-view, if (scanf("%2047[^a-zA-Z0-9_]", tmp)); else if is misleading - it looks like a problem. If code remains with this, use {;} to clearly demarcate it.

if (scanf("%2047[^a-zA-Z0-9_]", tmp)) {
  ;
} else if ...
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I've changed from `feof(!stdin)` to `for(;;)` in all of my codes. But I still got the same error. Do you have any other suggestions? Other parts of my code are posted. – LINPOHSIEN Jan 05 '18 at 15:46
  • i thinks `"%[a-z]"` is fine because it goes well in a test. However, it fails in another one. – LINPOHSIEN Jan 05 '18 at 15:55
  • Curious your edit coded `else if (scanf("%2047[a-zA-Z0-9_]", tmp)` and not heeded advice [Check its value](https://stackoverflow.com/a/48114994/2410359). Why follow the `if()` block when `EOF` is returned? Same for `if (scanf("%2047[-]", tmp))`. – chux - Reinstate Monica Jan 05 '18 at 15:56
  • Does it look better now? I check their value before `if` statement, when `EOF` is returned, it will break – LINPOHSIEN Jan 06 '18 at 09:14
  • @LINPOHSIEN `if (cnt_divider != 1 && cnt_alphaNumerics != 1)` has 4 combinations: `F&&F`, `F&&T`, `T&&F`, `T&&T`, You code does not work well for `F&&T`, `T&&F`. so "when EOF is returned, it will break" is false. – chux - Reinstate Monica Jan 06 '18 at 13:08