0

I tried following sample[1] ; but since my OMElement is too large, (I'm converting a file, (800MB) as OMelement, it is coming from another process) I face following issues,

  • Process goes out of memory
  • Serialize takes much time.

Can anyone point me right solution?

[1]

 BufferedReader in = null;
 ByteArrayOutputStream baos = null;
 InputStream is = null;
 try {

    baos = new ByteArrayOutputStream();
    fileContent.serialize(baos);

    is = new ByteArrayInputStream(baos.toByteArray());

    in = new BufferedReader(new InputStreamReader(is));
Abimaran Kugathasan
  • 31,165
  • 11
  • 75
  • 105
Ratha
  • 9,434
  • 17
  • 85
  • 163
  • Why exactly do you need the input stream? Which other code are you passing it to? Does it _have_ to be an input stream or could you use some other technique, e.g. if your target API can act as a SAX content handler then you can go direct from the OM tree to that without having to go via plain XML on the way. – Ian Roberts May 09 '14 at 09:49
  • I again need to read the content from Inputstream in my custom class. The 'fileContent' is having the OMElement. We can use other technique, but i need to read the content. My OMElement contains a big file content, (with series of texts/characters) I raed that line by line and do processing in my custom class – Ratha May 09 '14 at 09:56
  • "Line by line" is a bit of a red flag when you're dealing with XML as it suggests you're trying to treat the XML as if it were plain text, and particularly so in this example as the one-arg `InputStreamReader` constructor will ignore the encoding declaration and always treat the stream as if it were in the default encoding for the current platform. If this is a custom class then why can't you simply pass it the `OMElement` directly and let it traverse the already-parsed XML tree itself? – Ian Roberts May 09 '14 at 10:41
  • Im doing string manipulation, not sure i can do that directly with OMelement..while (in.ready()) { StringBuffer output = new StringBuffer(); String row = in.readLine(); if (row.length() == 902) { append(output, row.substring(0, 4), false); append(output, row.substring(4, 64), false); append(output, row.substring(64, 73), false); – Ratha May 09 '14 at 10:45

1 Answers1

1

Unfortunately your question doesn't provide a clear description of the actual problem you are trying to solve. Instead it describes an issue with what you believe to be the solution to your problem. Therefore I can only try to reconstruct the problem based on the comments you made in response to Ian Roberts.

If my interpretation of these comments is correctly, then the problem is as follows. You have an XML document that contains an element with a long sequence of characters, which is structured into multiple lines:

<some_element>
line 1
line 2
line 3
...
line N
</some_element>

You want to process the content of the element line by line, but N is large, so that you need to find a memory efficient way to do that, i.e. an approach that avoids loading the entire content into memory.

The code snippet you have provided shows that you took a wrong direction when trying to solve that problem. The code serializes the OMElement representing some_element and then creates an InputStream/Reader from the serialized output. However, that would also contain the start and end tags for some_element, which is not what you want. Instead you are only interested in the content of the element. If you look at the OMElement interface, you can see that it actually defines a method that returns that content as a Reader. It is called getTextAsStream and the Javadoc explains how to use that method in such a way that the memory usage is O(1) instead of O(N).

Andreas Veithen
  • 8,868
  • 3
  • 25
  • 28
  • Sorry if my question is not clear. Your understanding is right. I have a big textfile which contains lot of lines. (file size is 1GB)When it pass through the system(apache synapse) it comes in soapbody. I want to read it line by line to process the file in my custom java class. I out of memory issue and serialization issue also. I referred your older post here [1] and thought there is no library. Thanks fro the answer i'll try out [1]http://stackoverflow.com/questions/8221892/get-an-inputstream-io-reader-from-omelement-object – Ratha May 12 '14 at 01:14