4

I am writing a java code to search of email address and passwords in a large txt file (6-8Gb). I have written the code and it worked with 200Mb txt file and given the output. But when i input a 500Mb file it displays the following error.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:331)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
at regular.expression_fyp.RegularExpression_fyp.main(RegularExpression_fyp.java:56)
Java Result: 1

I am new to java programming so i need any help from you to solve this problem. What should i do to solve this problem? Please send me any suggessions and i have attached my code as well. Thank you.

import java.io.FileInputStream;

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegularExpression_fyp
{

   public static void main(String[] argv) throws Exception {

        String pattern = "\\w[%A-Za-z0-9-]+\\%40\\w+\\.com\\w[%A-Za-z0-9]+";
        Pattern r = Pattern.compile(pattern);

        FileInputStream input = new FileInputStream("E:\\test7.txt");
        FileChannel channel = input.getChannel();

        ByteBuffer bbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, (int) channel.size());
        CharBuffer cbuf = Charset.forName("8859_1").newDecoder().decode(bbuf);

        Matcher matcher = r.matcher(cbuf);

        if (matcher.find( )) {
            System.out.println("Found value: " + matcher.group(0) );

        } else {
            System.out.println("NO MATCH");
        }
    }
}
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
Kevin Ð Alwis
  • 165
  • 1
  • 1
  • 13

4 Answers4

4

The problem is that the CharBuffer is converting the bytes and thus brining the file into the heap. A more efficient solution is to write a wrapper for the ByteBuffer which allows you to the memory mapped files directly.

You can create a CharSquence which wraps the ByteBuffer to parse the whole mapping without bring it into the heap.

import java.nio.ByteBuffer;

/**
 * Assumes ISO-8859-1 character encoding
 */
public class BufferCharSequence implements CharSequence {
    final ByteBuffer bb;

    public BufferCharSequence(ByteBuffer bb) {
        this.bb = bb;
    }

    @Override
    public int length() {
        return bb.limit();
    }

    @Override
    public char charAt(int index) {
        return (char) (bb.get(index) & 0xFF);
    }

    @Override
    public CharSequence subSequence(int start, int end) {
        bb.limit(start + end);
        bb.position(start);
        return new BufferCharSequence(bb.slice());
    }
}

Note: this will use <= 24 bytes of heap regardless of the capacity of the ByteBuffer.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Thank you verymuch for your help. I appriciate it. – Kevin Ð Alwis Aug 28 '14 at 13:25
  • How can the encoding be changed, e.g. UTF-8? – Peter Keller Feb 27 '18 at 15:31
  • @PeterKeller is more complex as you need to handle multi-byte characters and depending on your use case , if you want random access you might need to build an efficient tree structure. You could have array which stores the offset of every nth character, e.g. every 256 characters and linearly scan within those. Obviously you want you index to be off heap and use less memory than the String itself. – Peter Lawrey Feb 28 '18 at 10:11
0

As already recommended, one good way to overcome the issue is to load the data from file in smaller batches. But there is an alternative way, for which you should understand how Java programs are allocated memory:

JVM gets allocated a limited amount of memory during startup. To make things more complex, there are several different regions in memory of the JVM you can tweak, but as your "java.lang.OutOfMemoryError: Java heap space" message indicates, we are interested in one particular region called heap.

You can specify the size of the heap yourself similar to the following example granting 1G of memory to a Java program:

java -Xmx1024m com.mycompany.MyApplication

If your JVM is already running, you can see the value of the specified parameter for example via checking the output of jps command listing the startup parameters, among which you see the familiar -Xmx again specifying the value of the maximum allowed heap to 1GB:

my-machine:demo me$ jps -lvm
6116 com.mycompany.MyClass -Xmx1024m

If you have not specified it yourself, a platform-specific default will be used, value of which you can check for example by listing the output of the java with -XX:+PrintFlagsFinal attribute, which lists the output in bytes, but again, the output lists the heap size to be equal to exactly 1GB or 1073741824 bytes:

my-machine:demo me$ java -XX:+PrintFlagsFinal |grep MaxHeapSize
uintx MaxHeapSize                              := 1073741824      {product} 

So, even though batch loading can and will help, sometimes it is easier to solve problems just by tossing more resources towards it. So, when facing the next "java.lang.OutOfMemoryError: Java heap space" error, you can sometimes bypass it just by increasing the resources available for the JVM.

Flexo
  • 87,323
  • 22
  • 191
  • 272
Ivo
  • 444
  • 3
  • 7
-1

Have you tried to decrease the file buffer size? Maybe you should do an optimized approach to that, It looks like your buffer is getting fully loaded with the 6Gb file, that's what is blowing your app.

You could try increasing your jvm's HEAP size. You could run your code using java -Xms[initial heap size] -Xmx[maximum heap size]

Check this answer and see if it helps.

Community
  • 1
  • 1
  • Thank you for your grate answer! I really appriciate your help! now i have added -Xmx1000m in to my project. so it works well. i think this is your answer? thank you again! – Kevin Ð Alwis Aug 28 '14 at 13:35
-1

Thank you everyone for your grate contributions! Since i am using netbeans i have found another way (today). according to that i wen to the project properties and under run,i have added -Xmx1000m to the vm options. So now my programe works fine. But i want to know whether this can cause me any error in the futuer because i am suppose to make this programe executable. so this should run in other windows OS as well. Will this change make any problem for me in the futuer?

Kevin Ð Alwis
  • 165
  • 1
  • 1
  • 13