1

I am experimenting with PipedInputStream and PipedOutputStream and can't understand why the following code would result in a Java Heap exhaustion problem. All transient String objects created should be gc-ed. Why then do I get an OutOfMemoryError ?

I am trying to write and read 1000 String objects each 1 million characters long. The below code fails about half-way through even when invoked with -Xmx2g. What's more the trace:

written string #453
read string #453
written string #454
Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space

... reveals that the PipedInputStream is only one String object "behind" the PipedOutputStream. I don't see why garbage collection has failed to reclaim all necessary heap memory.

import java.io.*;
import java.util.*;


class Worker implements Runnable {

    private ObjectOutputStream oos;
    private PipedInputStream   pis;

    public Worker() throws IOException {
        this.pis = new PipedInputStream();
        this.oos = new ObjectOutputStream(new PipedOutputStream( pis ));
    }

    @Override
    public void run() {
        try {
            for (int i = 0 ; i < 1000 ; i++) {
                oos.writeObject(aBigString());
                System.out.printf("written string #%d\n", i);
            }
            oos.flush();
            oos.close();
        } catch (IOException e) {
            throw new RuntimeException(e.getMessage());
        }
    }

    private static String aBigString() {
        StringBuffer sb = new StringBuffer();
        for (int i = 0 ; i < 1000*1000 ; i++)
            sb.append("X");
        return sb.toString();
    }

    public PipedInputStream getInput() {
        return this.pis;
    }
}


public class FooMain {
    public static void main(String args[]) throws IOException, ClassNotFoundException {
        Worker worker = new Worker();
        (new Thread(worker)).start();
        ObjectInputStream ois = new ObjectInputStream(worker.getInput());
        String record = null;
        int i = 0;
        try {
            while (true) {
                record = (String) ois.readObject();
                System.out.printf("read string #%d", i++);
            }
        } catch (EOFException e) {
            ois.close();
            System.out.println("done.");
        }
    }
}
trincot
  • 317,000
  • 35
  • 244
  • 286
Marcus Junius Brutus
  • 26,087
  • 41
  • 189
  • 331
  • Have you tried StringBuilder instead of StrungBuffer? – Fildor Jun 18 '13 at 11:23
  • Could be perm gen filling up, which is off heap. 32 bit JVM might not be able to address all that. http://stackoverflow.com/questions/1434779/maximum-java-heap-size-of-a-32-bit-jvm-on-a-64-bit-os – duffymo Jun 18 '13 at 11:34
  • 2
    Just to get the gist of your code: you start a thread that fills up a buffer within the `PipedInputStream` with a thousand 1 mio. chars while deserializing them at the same time? Those strings need to be multiple times in memory: as string itself when creating, as byte[] when serialized in the pipe and back as string when reading. This could take up to 6gigs in the worst case. – Thomas Jungblut Jun 18 '13 at 11:35
  • @ThomasJungblut: Why? wouldn't they be gc-ed? – Marcus Junius Brutus Jun 18 '13 at 11:46
  • @MarcusJuniusBrutus the input strings to the pipes will be collected when you `reset()`, but the internal buffer of your pipe will be collected when you dereference it- as long as it is alive it will carry the information of your 1k strings (in the worst case). – Thomas Jungblut Jun 18 '13 at 11:53
  • @ThomasJungblut the default PIPE_SIZE of PipedInputStream is 1024 bytes. So long as I call reset() on the ObjectOutputStream what else do I need to worry about? I don't see why any other pointers to objects should be kept. What am I missing? Otherwise, what else must I do to get rid of the accumulated objects? – Marcus Junius Brutus Jun 18 '13 at 13:03
  • @MarcusJuniusBrutus the buffer needs to grow if data wants to be inserted and if the buffer hasn't been emptied before. – Thomas Jungblut Jun 18 '13 at 13:10

2 Answers2

6

This has nothing to do with the Piped streams. you are hitting one of the classic pitfalls of the Object streams. In order to preserve object identity, the streams will hold onto all objects pass through them. If you need to use these streams for a large number of objects, you need to periodically call reset() on the ObjectOutputStream (but beware that object identities are not preserved across reset calls).

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
  • I just need to understand what the ramifications of using reset() and "not preserving object identities" are; but maybe that should be another question. – Marcus Junius Brutus Jun 18 '13 at 11:44
1

I'd recommend downloading Visual VM, installing all the plugins, and attaching it to your PID while the code executes. It'll show you memory, threads, objects, CPU, and lots more.

duffymo
  • 305,152
  • 44
  • 369
  • 561