0

I have a program that parses through hundreds of thousands of files, stores data from each file and towards the end, prints some of the data extracted into an excel document.

These are some of the errors I encountered and handled in regards to memory:

  1. java.lang.OutOfMemoryError: Java heap space Increased memory to 2gb

  2. Error occurred during initialization of VM. Could not reserve enough space for 2097152KB object heap downloaded jre8 for 64 bit machine. set -d64 as one of the default vm arguments

  3. java.lang.OurOfMemoryError: GC overhead limit exceeded Increased java heap memory from 2gb to 3g and included this argument -XX:-UseGCOverheadLimit

So now my default VM arguments are: -d64 -Xmx3g -XX:-UseGCOverheadLimit

The issue is that my program runs for several hours, reads in and stores all of the information I need from all of these files and then throws an error at the end when it's trying to print everything if a memory error occurs.

What I'm wondering is if there is a way to store the data extracted and then access it again even if the program exits due to an error. The way I want to store the data is in the same format I use it in the program. For instance, let's say I have several hundred thousand files of user records and I traversed through all of them, stored the data I extracted in user objects and I have these user and other personally defined objects stored in HashMaps and LinkedLists. Is there a way to store these user objects and HashMaps and LinkedLists in a way that even if the program exits due to an error I can write another program that will go through the objects stored so far and printing out the information that I want without having to go through the process of reading in, extracting and storing the information over again?

cques
  • 1
  • Which library are you using to write your Excel file? If you use apache poi, then you will have a bad time w/memory and time. I would recommend creating a csv file instead. – Luiggi Mendoza Aug 19 '14 at 20:19
  • 2
    Have you considered serializing the file to an object every once in a while? For example, when the list is bigger than 1.000 and bigger than 10.000, etc. http://www.mkyong.com/java/how-to-write-an-object-to-file-in-java/ – luanjot Aug 19 '14 at 20:21
  • Maybe you can add a daemon thread that saves the pointer to unmanaged memory address offset to HDD or some database then keeps itself alive until new process is started. Then the new process reads the address from database and accessses to that address by unsafe means. – huseyin tugrul buyukisik Aug 19 '14 at 20:21
  • most probably u r creating too many objects. try to reuse the same instances – eldjon Aug 19 '14 at 20:35

2 Answers2

0

One way of doing so is called serialization. (What is object serialization?).

However, depending on you data, you could just write your informations into a handy XML file and after extracting all the data just load the XML and proceed further.

Hope that helps.

Community
  • 1
  • 1
Stoiker
  • 43
  • 1
  • 6
0

First of all, it is very rare that you need this much text data in memory at the same time, and can't use and discard it iteratively.

If you really need to operate on this much data, consider using a map-reduce framework (such as those that Google provides). It will solve both speed and memory problems.

Finally if you are really sure you can't solve your problem in the other two ways or if the map-reduce setup is not worth it to you then then your only option is to write the data to file (somewhere). A good way serialize your data is to use Json. Google's gson and also Jackson 2 are popular libraries to do this.

nmore
  • 2,474
  • 1
  • 14
  • 21