0

Our application need to read a file with a single line and that single line contains large amount data . What we are doing is that , read the line from file and store it in string and tokenize the string with - and store to a list . From that list some entries are to be checked.

the method is as follows

public bollean checkMessage(String filename){
boolean retBool = true;
LinkedList tokenList;
int size;
String line = "";
try {
    File file = new File(filename);
    FileInputStream fs = new FileInputStream(file);
    InputStreamReader is = new InputStreamReader(fs);
    BufferedReader br = new BufferedReader(is);
    while ((line = br.readLine()) != null) {
        line.trim();
        tokenList = tokenizeString(line, "-");
        if (tokenList == null) {
            retBool = false;
            resultMsg = "Error in  File.java "                  
        }
        if (retBool) {
                retBool = checkMessagePart(tokenList);
        }
   }
}

the error occurs at line , while ((line = br.readLine()) != null)

error is

Caused by: java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuffer.append(StringBuffer.java:322)
    at java.io.BufferedReader.readLine(BufferedReader.java:363)
    at java.io.BufferedReader.readLine(BufferedReader.java:382)

Actually increasing heapsize didn't work. the size of the file trying to read is more than 1gb. Also tried to read as chunks of bytes , but when adding the read data to StringBuilder or list will again generate the MemoryError

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • increase `maxheap` and try again – sidgate Jul 25 '16 at 11:12
  • is it Jboss application server ? If yes then increase heap size -Xmx value from run.conf and restart your Jboss to try again. or you can consider @sidgate suggestion first – Raja G Jul 25 '16 at 11:12
  • 1
    Possible duplicate of [How to deal with "java.lang.OutOfMemoryError: Java heap space" error (64MB heap size)](http://stackoverflow.com/questions/37335/how-to-deal-with-java-lang-outofmemoryerror-java-heap-space-error-64mb-heap) – Unknown Jul 25 '16 at 11:13
  • You should not read the whole line in memory. Stream the bytes in and work in chunks. – raphaëλ Jul 25 '16 at 11:14
  • i have tried that .. maxheapsize given is 2048M –  Jul 25 '16 at 11:14
  • Why not try and get the tokens from the file instead? – npinti Jul 25 '16 at 11:14
  • How big is your line?! How many characters? You'll want to read it in chunks of bytes instead of reading the whole line. – Tunaki Jul 25 '16 at 11:15
  • The other idea: considered changing the file format? – GhostCat Jul 25 '16 at 11:18
  • yes its jboss application server.. and i have tried by increasing heapsize –  Jul 25 '16 at 11:22

1 Answers1

4

If the problem is that you cannot read the file to a String, then don't do it. Read it token by token by using some other method. The easy one is using Scanner with the right delimiter ("-" in your case). If you find its performance lacking, you could resort to implementing your own version of BufferedReader in which the "lines" are split by that character instead of the normal values.

Javier Martín
  • 2,537
  • 10
  • 15
  • Yes indeed forgot about the scanner (deleted my post which used a "direct array-buffer") – raphaëλ Jul 25 '16 at 11:56
  • @raphaëλ Well, Scanner is handy but _if I remember correctly_ it can become a performance bottleneck, that's why I mentioned the other option which would be quite similar to what you suggested. Now that I think of it, I find it pretty absurd for BufferedReader and friends to not have overloads in which you can specify a character/regex to break lines at. – Javier Martín Jul 25 '16 at 12:51
  • Yes something like https://gist.github.com/rparree/ad1911f9e63373f66ace5a1f1a92ebb8, requires some nifty but not hard string manipulations – raphaëλ Jul 25 '16 at 12:54
  • @raphaëλ yeah, just like that. I just suggested wrapping that into a class that extended BufferedReader because if the OP is using Java 8 then the [BufferedReader.lines method](https://docs.oracle.com/javase/8/docs/api/java/io/BufferedReader.html#lines--) is quite interesting, providing a lazily-populated `Stream` with each "line". – Javier Martín Jul 25 '16 at 12:58
  • thank you very much for your reply...... and tried the code in https://gist.github.com/rparree/ad1911f9e63373f66ace5a1f1a92ebb8. As i need to split the data with '-' and put those segments in to a list for some other checking purpose, i have tried to add it into a list. but it also shows out of memory error after some iterations –  Jul 27 '16 at 09:07
  • @Swapna Obviously. If the problem is that the full dataset is too big, you *need* to parse it in steps and remember a limited number of them at most. If you are adding every token you read to a list, that's still going to hold on to the memory. So stop trying to put lipstick on your pig and change the way you parse the data. If you have no other option but to hold on to the entirety of it, you'll need to raise the Java memory limit or even add more to your computer. – Javier Martín Jul 27 '16 at 10:20