0

Currently, we use c++ code to read in files (line by line, then sort it and save it to other format (txt file)), data read in line by line is saved in vector. This is all fine for small size data file.

but now we need to support large data files which crash our code (no enough memory for vector to reallocate and store. we can't know how many lines data we'll have, so we can't set size for vector).

So we are thinking we should probably redesign our code to deal with large data. This time, we hope we can save data in a way which we can manipulate (search, sort, insert, ...) data locally and as a whole.

I hope someone here could point me to a right direction how I should do this: such as what languages, data structures, algorithms, and etc I can use.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
user1558064
  • 867
  • 3
  • 12
  • 28
  • You can sort the files in chunks say 500 MB each in memory and then merge the files while keeping the order... – sethi Jul 29 '13 at 18:19
  • Thanks for your suggestion. Sorry, forget to mention that the files include different sections, each section needs to be dealt with at the same time. – user1558064 Jul 29 '13 at 18:25

1 Answers1

1

Have you looked at using memory-mapped files? They allow files to be addressed as though they are part of the application's memory, even if they are larger than the actual available memory.

See the following links for more information on what they are:

These links are previous answers to questions about size limitations of memory-mapped files. Basically the file can be larger than the address space, but you may not be able to "view" all of it at once.

Community
  • 1
  • 1
maditya
  • 8,626
  • 2
  • 28
  • 28