1

I have a program that load data into a DB thru namedpipes, very cool. This program was running for about 2 years and accept text files or gzip.

But now appeared some zip to load and I want to improve it. But I can't put this to work, I'm getting an OutOfMemoryError.

(Of course, I'm calling this using -Xms512M -Xmx2048M)

Below is how I get the InputStream:

PipeLoader.java

protected BufferedReader getBufferedReader(File file, String compression) throws Exception {
    BufferedReader bufferedReader = null;

    if(compression.isEmpty())   {
        bufferedReader = new BufferedReader(new FileReader(file), BUFFER);
    } else if(compression.equalsIgnoreCase("gzip")) {
        InputStream fileStream = new FileInputStream(file);
        InputStream gzipStream = new GZIPInputStream(fileStream);

        // Works fine
        Reader reader = new InputStreamReader(gzipStream);
        bufferedReader = new BufferedReader(reader, BUFFER);
    } else if(compression.equalsIgnoreCase("zip")){
        InputStream fileStream = new FileInputStream(file);
        ZipInputStream zipStream = new ZipInputStream(fileStream);
        zipStream.getNextEntry(); // For testing purposes I'm getting only the first entry

        Reader reader = new InputStreamReader(zipStream); // Works only with small zips
        bufferedReader = new BufferedReader(reader, BUFFER);
    }

    return bufferedReader;
}

I'm also tried with TrueVFS library:

// The same: works with small zip files, OutOfMemoryError with big zip files
TFile tFile = new TFile(file);
TFileInputStream tfis = new TFileInputStream(new TFile(tFile.getAbsolutePath(), tFile.list()[0]));

Reader reader = new InputStreamReader(tfis);
bufferedReader = new BufferedReader(reader, BUFFER);

And yes, I'm closing everything properly (remember, works with gz!).

In this case I need to load some zip file with only 1 plain textfile inside (~4GB zipped, ~35GB unzipped)

I got an OutOfMemoryError in the first file, in less than 1min from the start.

PS.: This is not a duplicate from Reading a huge Zip file in java - Out of Memory Error, he had the option to read each one of the small files from inside the zip, but I have only 1 big file.

I ran with -XX:+HeapDumpOnOutOfMemoryError and readed the .hprof file with Memory Analyser, but it doesn't help me much =/:

MemoryAnalyser.png

Please, I need help.

Community
  • 1
  • 1
Eric Sant'Anna
  • 267
  • 4
  • 17
  • So, the file is a text file with lots of line breaks? From the stacktrace it looks like `readLine()` tries to fit a very large portion of file into one array, which should suggest that there are no (or very few) line breaks. – Steinar Apr 14 '14 at 23:13
  • I hadn't thought of that! I don't know how the file is, they just gave me the layout, probably who is transmiting me files is doing wrong, as usual... Thanks, tomorrow I'll come back with news. – Eric Sant'Anna Apr 14 '14 at 23:25
  • Have you tried with TrueZip: https://truezip.java.net/truezip-path/index.html – sendon1982 Apr 14 '14 at 23:43
  • @sendon1982 I tried with TrueVFS (TrueVFS replace the TrueZIP project) – Eric Sant'Anna Apr 15 '14 at 14:36
  • @Steinar Good catch! You're right! Someone put | (pipes) instead of \n. In the heat of battle I hadn't thought of that, because that was my first work with zip. I'm not sure, but should you rewrite your comment as an answer? – Eric Sant'Anna Apr 15 '14 at 14:43

1 Answers1

2

If you look at the stacktrace, you can see that BufferedReader.readLine() ultimately leads to the creation of a very large array, which is causing the OutOfMemoryError.

Since readLine() keeps reading the input until it reaches a line break, this indicates that there are no (or very few) line breaks in the zipped input file.

Steinar
  • 5,860
  • 1
  • 25
  • 23