3

I have a method to read and parse an extremely long xml file. The xml file is read into a string, which then is parsed by a different class. However, this causes the Java to use a large amount of memory (~500 MB). Normally, the program runs at around 30 MB, but when parse() is called, it increases to 500 MB. When parse() is done running, however, the memory usage doesn't go back down to 30 MB; instead it stays at 500 MB.

I've tried setting s = null and calling System.gc() but the memory usage still stays at 500 MB.

public void parse(){
        try {
            System.out.println("parsing data...");
            String path = dir + "/data.xml";
            InputStream i = new FileInputStream(path);
            BufferedReader reader = new BufferedReader(new InputStreamReader(i));
            String line;
            String s = "";
            while ((line = reader.readLine()) != null){
                s += line + "\n";
            }

            ... parse ...

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
}

Any ideas?

Thanks.

CharithJ
  • 46,289
  • 20
  • 116
  • 131
Jason Yang
  • 130
  • 8
  • 1
    Do not concatenate strings. Use [`StringBuilder`](https://docs.oracle.com/javase/tutorial/java/data/buffers.html) instead, – PM 77-1 Aug 02 '15 at 01:36
  • 1
    That "... parse ..." leaves a lot of space for other memory usage. Are you sure the string s is your problem? – Don Roby Aug 02 '15 at 01:41
  • @DonRoby Yes, I'm sure, because I deleted everything else, but the problem still remains. Also, I've commented the loop, and the problem disappears. – Jason Yang Aug 02 '15 at 01:43
  • 1
    @PM77-1 Thanks, that helped a lot. The memory usage dropped to around 90 MB. While it doesn't return to 30 MB when done, as long as I don't call parse() 100 times, I should be fine! – Jason Yang Aug 02 '15 at 01:50
  • @JasonYang: Memory usage dropped because it creates less string instances now. However, there is a memory leak unless you close the stream. – CharithJ Aug 02 '15 at 09:48

4 Answers4

2

Solution for your memory leak

You should Close the BufferReader at the end in order to close the stream and releases any system resources associated with it. You can close both InputStream and BufferReader. However, closing the BufferReader actually closes its stream as well.

Generally it's better to add a finally and close it.

finally 
{
   i.Close();
   reader.Close();
}

Better approach try-with-resources Statement

try (BufferedReader br = new BufferedReader(new FileReader(path))) 
{
        return br.readLine();
}

Bonus Note

Use a StringBuilder instead of concatenating strings

String does not allow appending. Each append/concatenate on a String creates a new object and returns it. This is because String is immutable - it cannot change its internal state.

On the other hand StringBuilder is mutable. When you call Append, it alters the internal char array, rather than creating a new string object.

Thus it is more memory efficient to use a StringBuilder when you want to append many strings.

CharithJ
  • 46,289
  • 20
  • 116
  • 131
0

Just a note: a try-with-resources block will help you a lot with IO objects like those readers.

try(InputStream i = new FileInputStream(path);
    BufferedReader reader = new BufferedReader(new InputStreamReader(i))) {
    //your reading here
}

This will make sure these objects are disposed of by calling close() on them, regardless of how your method block exits (success, exception...). Closing these objects may also help to free up some memory.

The thing that's probably causing a big slowdown and probably blowup of memory usage, though, is your string concatenation. Calling s += line + "\n" is fine for a single concatenation, but the + operator actually has to create a new String instance each time, and copy the characters from the ones being concatenated. The StringBuilder class was designed just for this purpose. :)

Oly
  • 2,329
  • 1
  • 18
  • 23
  • Note on implementation: the reason the `StringBuilder` doesn't have the same problems as repeated `String` concatenation using `+` is because the underlying data is mutable, and the same object is used to build up the string. The detail is that there's an underlying collection of char arrays (as far as I recall) which hold the string to be built in chunks. What this means is that the only time each of these characters is copied is 1) when `append()` is called to the `StringBuilder` and 2) when `toString()` is called, as opposed to once for every single iteration of the loop. – Oly Aug 02 '15 at 02:11
  • The different you describe between using `+` for string concatenation and StringBuilder was true in earlier versions of Java, but is no longer true. See: http://stackoverflow.com/a/47628/1057429 – Nir Alfasi Aug 02 '15 at 02:43
  • @alfasin I don't see anything in that answer that supports your claim. – user207421 Aug 02 '15 at 04:12
  • @EJP my old friend, thanks for joining us! That answer shows that the byte code to which `a+=b` compiles is equivalent to creating a `StringBuilder` and using its `append` method. – Nir Alfasi Aug 02 '15 at 04:21
  • 1
    @alfasin This is definitely true, in the case of multiple appends _in the same expression in the code_. This means there's no performance penalty for doing `"a" + "b" + "c" + ...` over using a `StringBuilder`. The issue arises when we're iterating - using the inbuilt `+` operator creates a new `StringBuilder` _each time_. This can result in a huge performance penalty. It would be a difficult task, beyond the call of duty, for a compiler to even attempt to optimise that kind of thing. – Oly Aug 02 '15 at 12:13
  • I missed the fact that you're referring to a loop. right! – Nir Alfasi Aug 02 '15 at 14:14
0

The 500MB is caused by parsing, so it has nothing to do with the string, or the BufferedReader either. It is the DOM of the parsed XML. Release that and your memory usage will revert.

But why read the entire file into a string? This is a waste of time and space. Just parse the input directly from the file.

user207421
  • 305,947
  • 44
  • 307
  • 483
0

You should keep in mind that calling System.gc(); will not definitely do the Garbage collection but it suggest GC to do it's thing and it can ignore doing that if GC dont want to garbage collect. it is better to use StringBuilder do reduce the number of Strings you create in memory because it only creates String when you call toString() on it.

RoHaN
  • 1,356
  • 2
  • 11
  • 21