-2

I saw there is lots of questions related to this, but I did not find one that is similar to mine. I am running a model written in mixed C and Fortran on a LSF platform. The wired thing is that my model was running fine until last week it starts to throw out this error. What's even wired is that the error does not occur every time: some times, the model can run (no error), sometimes the job will abort when trying to read in the input files. The error points to the code where I have never modified So far I have tried:

1) recompile the source code and use the newly-created executable;

2) Copy the executable from another directory that is running fine;

3) delete the whole directory and create a new directory and repeat the above two;

4)start from a fresh login

5)run only 1 job each time to rule out the possibility that influence from other jobs running on the same node

6)change the job name

7)change the running length (model year)

And the error still occur 90% of the times. The error points to inpakC.c file (I attached part of the file bellow) 'free(line)' part. I do not see anything wrong with this, since it is a prewritten code. Any help or advice will be greatly appreciated!

enter image description here

enter image description here

#ifdef MPI
int ipck_LoadF(char *filename, MPI_Comm comm)
 #else
int ipck_LoadF(char *filename)
#endif
 {
   /*local variables */
  FILE *fileptr;                /*pointer to the file */
  int bsize;                    /*buffer size (this was the default used in         

  int maxLsize;                 /*max line size(this was the default used in 
  char *line;                   /*the next line in the file */
  int n, m, clrt;
  int my_address;
  int c;

  my_address =0;
  #ifdef MPI
  MPI_Comm_rank(comm, &my_address);
  #endif
 if(my_address == 0){
  bsize = 0;
  maxLsize = 0;
  clrt = 1;  /*current line running total set to zero*/

  /*open the file */
  /*if the file was not opened, exit and return 1 */
  if ((fileptr = fopen(filename, "r")) == NULL)
  {
return 1;
  }
  /*go through file and count the number of elements - used to know how much mem to allocate*/
  while ((c = fgetc(fileptr)) != EOF)
   {
     bsize++;
     clrt++;
     /*get length of longest line*/
     if (c  == '\n')/*end of the line has been reached*/
       {
         if (clrt > maxLsize)/*line contains the most char so far*/
        {
         maxLsize = clrt;
         clrt = 1;
         }
      else  /*line has less char than the record so just reset the counter*/
      {
         clrt = 1;
         }
        }
       }
   /*allocate mem for the buffer*/
   buffer = (char *) calloc(bsize, sizeof(char));
   /*postion pointer back to the begining*/
   rewind(fileptr);

   /*read the contents of the file into the buffer variable */
   while (!feof(fileptr))
    {
      /*allocate memory to hold the line to read into and the trimmed line */
      line = (char *) calloc(maxLsize, sizeof(char));

      /*get the next line */
      fgets(line, maxLsize, fileptr);

      /*see if the next line is blank; if so skip the rest
      and continue   retrieving lines*/
      if( strcmp(line, "\n")==0) continue;

      /*get the position of the comment character.
      if one does not exist, it will return the length of the string*/                                                        
      n=strcspn(line,"#");

      m=n-2;
      while (*(line+m)==' ' || *(line+m)=='/' || *(line+m)=='\n'){
      n--;
      m--;
        }

if (n > 0){
  /*cat n-1 chars to the buffer     */
  strncat(buffer,line,n-1);
}


/*put a padded space after the new line added to the buffer */
strcat(buffer, " ");
/*clean up strings and flush */
free(line);
fflush(fileptr);
}
 /*close the file */
 fclose(fileptr);
   }
        /*broadcast to all of the nodes*/
         #ifdef MPI
         MPI_Bcast(&bsize,1,MPI_INT,0,comm);
        if (my_address != 0)
         buffer = (char *) calloc(bsize, sizeof(char));
          MPI_Bcast(buffer,bsize,MPI_CHAR,0,comm);
        #endif
       return 0;
     }
MD XF
  • 7,860
  • 7
  • 40
  • 71
harmony
  • 111
  • 1
  • 9
  • Use [valgrind](http://valgrind.org). If you're performing invalid memory access, it will tell you where. – dbush Jul 07 '16 at 19:42
  • You might want to start by fixing your indentation and incorrect commenting. – EOF Jul 07 '16 at 19:44
  • And don't post images of text! – too honest for this site Jul 07 '16 at 19:47
  • And see [Why is “while ( !feof (file) )” always wrong?](http://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Andrew Henle Jul 07 '16 at 19:51
  • There are two comments in this code snippet that are not closed. – Kusalananda Jul 07 '16 at 19:57
  • From @AndrewHenle `while(fgets(line, maxLsize, fileptr) != NULL) {...}` with `line` allocation outside of the loop. – Weather Vane Jul 07 '16 at 19:59
  • There's a memory leak in the `while (!feof())`-loop if the `continue` statement is executed. – Kusalananda Jul 07 '16 at 20:05
  • 1
    @WeatherVane After `rewind(fileptr);`, there's another loop. Where the return value from `fgets()` is ignored. That will result in the data from a failed `fgets()` being processed, and I think it gets `strcat()`'d to a buffer that seems to be the size of the file. In other words, quite likely to result in a SEGV when it overwrites the end of the buffer. Or an invalid `free()` because the heap is corrupted. – Andrew Henle Jul 07 '16 at 20:34
  • @AndrewHenle yes, that's where the `!feof()` is. I suggested restructuring that loop. – Weather Vane Jul 07 '16 at 20:39
  • Sorry guys, first time using overflow, made lots of mistakes when trying to copy the code – harmony Jul 08 '16 at 05:40

1 Answers1

0

The error means that something is attempting to call free() on a pointer that is not valid to call free on. Either it didn't come from malloc(), or it was already freed. This is usually indicative of memory corruption. Something in your program is overwriting memory that it shouldn't be. Probably because it's writing to an array with an invalid index. It is not unusual at all for this kind of problem to reproduce in an unpredictable way like you've described.

This kind of problem can be difficult to track down. Some approaches include:

  • put bounds checking assertions next to your array accesses

  • run under a tool like valgrind or address sanatizer that attempts to detect this kind of problem.

  • study the contents of memory under a debugger and try to deduce what went wrong.

Lawrence D'Anna
  • 2,998
  • 2
  • 22
  • 25