1

I have a ByteArrayOutputStream which holds a byte representation of an XML with 750MB size.

I need to convert it to String.

I wrote:

ByteArrayOutputStream xmlArchive = ...
String xmlAsString = xmlArchive.toString(UTF8);

However although I am using 4GB of heap size I get java.lang.OutOfMemoryError: Java heap space

What is wrong? How can I know which heap size to use? I am using JDK64 bit

UPDATE

I need it as String in order to remove all the characters before "<?xml"

Currently my code is:

String xmlAsString = xmlArchive.toString(UTF8);
int xmlBegin = xmlAsString.indexOf("<?xml");
if (xmlBegin >0){
        return xmlAsString.substring(xmlBegin);
}
return xmlAsString;

I then convert it again to byte array.

UPDATED 2 The ByteArrayOutputStream is written like this:

HttpMethod method ..
InputStream response = method.getResponseBodyAsStream();
byte[] buf = new byte[5000];
while ( (len=response.read(buf)) != -1) {
    output.write(buf, 0, len);
}

len is from the header of the response Content-Length

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Dejell
  • 13,947
  • 40
  • 146
  • 229
  • Do you really need it as a string in memory? What are you going to do with it afterwards? Bear in mind that `ByteArrayOutputStream.toString()` always uses the platform-default encoding, which probably isn't a good idea. – Jon Skeet Jul 08 '14 at 12:30
  • "Bear in mind that ByteArrayOutputStream.toString() always uses the platform-default encoding, which probably isn't a good idea" doesn't sending as variable as UTF-8 influences it? I will update my question – Dejell Jul 08 '14 at 12:32
  • xmlArchive.toString(UTF8); – Dejell Jul 08 '14 at 12:34
  • A String character requires 2 bytes. -- Apart from that: what's the idea of having this String? – laune Jul 08 '14 at 12:35
  • Right. Assuming the bytes are genuinely a UTF-8 representation, that's better... but I would still try to avoid doing this anyway. What are you really trying to achieve? Having a 1.5GB char array in memory really isn't going to scale well... – Jon Skeet Jul 08 '14 at 12:35
  • @laune I updated my question. I get the ByteArrayOutpuStream and need to remove the bytes until xml – Dejell Jul 08 '14 at 12:37
  • xmlArchive: 0.7 GB + xmlAsString 1.4 GB = 2.2 GB. Did you try `new ByteArrayOutputStream(758_000_000)`? By the way ByteBuffer, CharSequence might be interesting too. – Joop Eggen Jul 08 '14 at 12:40
  • @JoopEggen I need to convert it to string to remove the beginning of the characters until – Dejell Jul 08 '14 at 12:43
  • You might use writeTo( x ) where x is a PipedOutputStream connected to a PipedInputStream (in another thread) which will then, hopefully, write the stuff after – laune Jul 08 '14 at 12:45
  • @laune do you suggest writing it to a file and then reading it from it? – Dejell Jul 08 '14 at 12:49
  • We need to see how the `ByteArrayOutputStream` is being written to. – Jamie Cockburn Jul 08 '14 at 12:50
  • No, not in order to eleiminate the garbage preceding " – laune Jul 08 '14 at 12:51
  • @Dejel And what are you doing finally with this data? – Jamie Cockburn Jul 08 '14 at 12:57
  • Piped I/O of @laune is fine, and a nice mental excercise. Extending an InputStream wrapping the original InputStream is possible too. It reads till ` – Joop Eggen Jul 08 '14 at 12:59

2 Answers2

2

You could use the Scanner class:

Scanner scanner = new Scanner(response, StandardCharsets.UTF_8.name());

// skip to "<?xml"
scanner.skip(".*?(?=<\\?xml)");

// process rest of stream
while (scanner.hasNextLine()) {
    String line = scanner.nextLine();
    // Do something with line
}
scanner.close();
Jamie Cockburn
  • 7,379
  • 1
  • 24
  • 37
1

Expanding on Jamie Cockburn's answer:

To fill in his while loop to match your expected behaviour:

byte[] buf = line.getBytes(StandardCharsets.UTF_8.name());
output.write(buf, 0, buf.length);
Cruncher
  • 7,641
  • 1
  • 31
  • 65