0

I'm starting to work with C++ again after a long break while coding in Java. I'm trying to convert a data processing job that I have over to C++. I've run into an issue while I'm trying to open and write to 100+ files at once (splitting 10GB text file to files by date). Again, I've only been back on C++ for about 2 days now so I'm full my code is riddled with other issues but I've created the simplest snippet that shows the issue.

  • I can change files_to_open to any number and I don't have any issues.
  • If files_to_write is <= 125 it runs file.
  • if files_to_write is > 125 it crashes.

What would cause this?

#include <map>  
#include <sstream>  

int main() {  
  std::map<int, FILE*> files;  
  int files_to_open = 200;  
  int files_to_write = 200;  

  // Open a set of files.  
  for(int i = 0; i < files_to_open; i++) {  
    std::ostringstream file_path;  
    file_path << "E:\\tmp\\file_" << i << ".txt";  
    files[i] = fopen(file_path.str().c_str(), "w");  
  }  

  // Write data to files.  
  for(int i = 0; i < files_to_write; i++) {  
    printf("%d\n", i);  
    fwrite("Some Data", sizeof(char), 9, files[i]);  
  }  

  // Close files.  
  for (auto& file : files) {  
    fclose(file.second);  
  }  

  // End it all.  
  printf("Press Any Key to Continue\n");  
  getchar();  
  return 0;  
}  
  • Where does it crash? In the writing loop, I presume? – Trojan Dec 13 '13 at 22:29
  • 1
    yep, try running it in the debugger... – vines Dec 13 '13 at 22:30
  • Why aren't you using IOStreams?? – David G Dec 13 '13 at 22:31
  • You should check the return value of `fopen` to see if the file was successfully opened or not. Also the C runtime has a default limit of [512 open file at once](http://stackoverflow.com/questions/870173/is-there-a-limit-on-number-of-open-files-in-windows) – David Brown Dec 13 '13 at 22:31
  • Just dont write to all of those files at the same time. Why not simply keeping only one output file open at a time? – Till Dec 13 '13 at 22:55
  • A little more info. I'm looping though a text file that can be > 50GB at times, with > 50 mil records. It's comes to me unsorted. The current processing job, written in Java takes about 2-3 hours to process a 50GB file. First it loops the single file and splits it into 108 files by county. Then it loops the 108 files and splits them by month; each county having about ~156 files. To write to only one file at a time I would first have to sort the data which would add to processing time. Keeping the ~156 files open lets me open->write->close each file while doing a single loop of the larger file. – user3101015 Dec 13 '13 at 23:50

1 Answers1

1

I'm going to assume that fopen returns NULL when files_to_write is > 125. Your OS has a limitation per process of the number of the file handles it can have open, and you're probably hitting the limitation.

125 makes perfect sense since you already have 0 (stdin), 1 (stdout) and 2 (stderr), so 125 more would be 128 which is a nice limitation.

Either way, you should check the return value from fopen before blindly writing to a FILE*

Paladine
  • 513
  • 3
  • 11