You've either got a memory leak or your understanding of the amount of string data that you are storing is incorrect. We can't tell which without seeing more of your code.
The scientific solution is to run your application using a memory profiler, and analyze the output to see which of your data structures is using an unexpectedly large amount of memory.
If I was to guess, it would be that your application (at some level) is doing something like this:
String line;
while ((line = br.readLine()) != null) {
// search for tag in line
String tagStr = line.substring(pos1, pos2);
// code as per your example
}
This uses a lot more memory than you'd expect. The substring(...)
call creates a tagStr
object that refers to the backing array of the original line
string. Your tag strings that you expect to be short actually refer to a char[]
object that holds all characters in the original line.
The fix is to do this:
String tagStr = new String(line.substring(pos1, pos2));
This creates a String object that does not share the backing array of the argument String.
UPDATE - this or something similar is an increasingly likely explanation ... given your latest data.
To expand on another of Jon Skeet's point, the overheads of a small String are surprisingly high. For instance, on a typical 32 bit JVM, the memory usage of a one character String is:
- String object header for String object: 2 words
- String object fields: 3 words
- Padding: 1 word (I think)
- Backing array object header: 3 words
- Backing array data: 1 word
Total: 10 words - 40 bytes - to hold one char
of data ... or one byte
of data if your input is in an 8-bit character set.
(This is not sufficient to explain your problem, but you should be aware of it anyway.)