I am parsing huge xhtml files and am trying to play around with the content in it. Basically the words in it, their positions etc. I tried using the HashMap, ArayList etc. All of them give OutOfMemory issue after loading 130347 data. What is the kind of data structure that can be used to hold huge data in JAVA.
-
3I don't think it's the data type that's the issue, I think you're trying to do "too much at once." If you're dealing with a large enough amount of data that it's essentially causing memory issues, you might want to break the steps apart and do it in chunks. – Michael Todd May 05 '10 at 18:30
-
I am getting this problem with a 5MB file. I am targetting to support upto 10MB file. – Rachel May 06 '10 at 16:19
4 Answers
What you are doing now, sucking all your data into one huge structure and then processing it, is not going to work regardless of what data structure you use. Try an incremental approach where you read some data, then process it, then read some more, etc. (Actually what you'd be doing this way is creating your own special-purpose data structure that handles the processing in chunks, so my first sentence isn't really accurate.)
One way to do this might be to parse the document using SAX, which uses an event-driven approach. You could have your content handler create and store objects you construct from reading the xml elements, process them once enough have accumulated, then clear the collection.

- 94,330
- 19
- 181
- 276
Look into your virtual machine memory settings. You can modify the VM memory size via the command line if that's where you are, or via a config file if you are in some kind of server side environment.
If you are using tomcat/eclipse, this thread should help you: Eclipse memory settings when getting "Java Heap Space" and "Out of Memory"
-
Good point. If you're running your app from the command line, you can pass something like -Xmx4G to allow it to use 4 gigabytes of memory. – intgr May 05 '10 at 18:35
-
Your question is pretty vague. But if you run out of memory then you should probably use an on-disk database instead. PostgreSQL, MySQL, HSQLDB, whatever.

- 19,834
- 5
- 59
- 69
-
Do you mean to say that the information that i collect from the document can be written into a hsqldb with a proper data structure on the local disk instead of loading into the memory so that can query what i need in an as needed basis. Since i need it only for that request, at the end of processing i have to delete my inserts is it? This is quite intresting. I have not applied hsqldb solutions for real time applications. Could you please tell me the tradeoff that i need to do for this kind of solution like performance since i will have to insert huge number of data making lot many calls? – Rachel May 06 '10 at 16:28
-
Which database would you suggest to use, to load data temporarily for a request and clearing them of at the end of the request. – Rachel May 06 '10 at 16:42
-
A 10MB XML file is by no means "huge data", so a disk database is probably overkill. – intgr May 15 '10 at 17:01