1

Is there any mechanism in Java to reduce the memory usage while reading large text files?

Almost every program I've come across uses String to read text files.But Java reserves space for each String literal.That's why I think memory usage gets increased since all String objects are stored. All the classes of java.io deals with String. But if we're not using StringBuilder then how can we reduce memory usage?

After all reducing memory usage is the primary concern of StringBuilder[since it's not immutable like String]. Then how can we exploit its feature in Java I/O operation without using String i.e. without using something like this: sb.append([String object]);

Debadyuti Maiti
  • 1,099
  • 4
  • 18
  • 30
  • 4
    Usually the secret behind processing large files is not trying to read them entirely to memory, and you don't need StringBuilder for that. – Joni Mar 24 '12 at 17:32
  • What is the concern. Are you trying to read entire file in to a string ? – ring bearer Mar 24 '12 at 17:33
  • 1
    String literals don't have anything to do with file i/o – josefx Mar 24 '12 at 17:34
  • This question does not make any sense to me. If you want to keep a file in memory then you have to pay the price for it, regardless of the language and/or runtime environment you use. You should provide a concrete example. – home Mar 24 '12 at 17:38
  • Hmm, "a little knowledge is a dangerous thing." – Kirk Woll Mar 24 '12 at 17:46
  • @DaveNewton Each time we're doing br.readLine(), since it's returning a String, the no. of temporary String objects [which gets stored in String constant pool by JVM] in memory increases.That's what I wanted to avoid & find a solution with StringBuilder. – Debadyuti Maiti Mar 25 '12 at 11:26
  • It's not a constant if it's not a constant. – Dave Newton Mar 25 '12 at 12:31
  • @DaveNewton But String is Immutable.That's why there's a chance of creating lots of temporary String objects. – Debadyuti Maiti Mar 25 '12 at 13:00
  • 1
    Which is different than using the string constant pool. If you don't want to use strings, use byte buffers. Have you actually profiled anything to even see if you care about the relatively small performance/memory improvements you'll make? – Dave Newton Mar 25 '12 at 14:04
  • @DaveNewton Well, according to Kathy Sierra [in SCJP book], whenever we're trying something like this : "String s = new String("abc"); // creates two objects, // and one reference variable. In this case, because we used the new keyword, Java will create a new String object in normal (nonpool) memory, and s will refer to it. In addition, the literal "abc" will be placed in the pool." That's what I wanted to point out i.e. there's always one duplicate literal for each String unlike StringBuilder. – Debadyuti Maiti Mar 25 '12 at 14:11
  • 1
    You're not creating a string literal when you read from a file, because there's no literal--I'm not sure why you don't see that. In order for there to be a string literal, there has to be a literal--in your example, there is--it's the `"abc"`. You're not doing that when you read from a file. – Dave Newton Mar 25 '12 at 14:55
  • @DaveNewton ok.Now I'm getting it. This is a code snippet of StringBuilder readline() method. : String str; if (s == null) { str = new String(cb, startChar, i - startChar); } else { s.append(cb, startChar, i - startChar); str = s.toString(); } ... str is returned from that method. So, if I'm right, here actually no String literal is being created when performing "str = new String(cb, startChar, i - startChar); " or StringBuffer's toString() method. Right? – Debadyuti Maiti Mar 25 '12 at 15:46

6 Answers6

1

Assume you have n strings, each of length 1 that you read from your input - for simplicity.

Using operator+ on strigns while reading will create a String object each time you concatenate strings, so you get strings of length 1,2,3,...,n

So the total memory usage of the strings combined is 1 + 2 + .. + n = O(n^2) in addition to the n strings you read from input

while if you use StringBuilder to create the final string, you actually create n - for input [each of length 1] and one object for the final string - of size n, so total memory usage of 1 + 1 + .. + 1 + n = O(n)

So, even if you use sb.append(String) - the space usage is asymptotcally beter then creating all intermediate strings - since you do not need to create intermediate String objects.

In addition - the performance [time] should be better when using StringBuilder - both because you create less objects, and both because of lesser memory usage - the gc doesn't need to work as hard as when concatenating strings naively.

(*)Note that it is easy to see that the above still holds for any length of strings.

amit
  • 175,853
  • 27
  • 231
  • 333
0

You can use the StringBuilders's append char method, to avoid the creation of intermediate strings, look at this post: https://stackoverflow.com/a/9849624/102483 Keep in mind that there is no way to reduce the memory footprint of the final String so that it's less than the size of the file you are reading.

Community
  • 1
  • 1
Hiro2k
  • 5,254
  • 4
  • 23
  • 28
0

Depending on what you are doing, you could create a pool of String and/or StringBuilder objects that are loaded with the values you need, cleared out and then reused. You could configure the pool to grow to a maximum value, and if the objects in the pool are not used, then set them to null where they will eventually be reclaimed by the garbage collector.

jhenderson2099
  • 956
  • 8
  • 17
0

You might want to consider something like this:

  BufferedReader reader = 
    new BufferedReader(
      new InputStreamReader(
        new ByteArrayInputStream(data)));
  String line;

  while ((line = reader.readLine()) != null)
    ...

See these links for more details:

BufferedReader for large ByteBuffer?

http://www.tutorialspoint.com/java/java_bytearrayinputstream.htm

Community
  • 1
  • 1
paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • putting this into a StringBuilder causes an OutOfMemory error on android for strings only 2.5% of total memory size. – Michael Dec 30 '14 at 03:06
0

Reader and its subclasses are based around char and char[], only convenience methods use String. Since StringBuilder.append() accepts char[] you can avoid creating unnecessary String objects if you only use the methods build around char[].

Note that while this reduces the number of temporary created String objects the overall memory requirements stay the same, the gc would collect any otherwise created String.

josefx
  • 15,506
  • 6
  • 38
  • 63
0

Instead of String, try using StringBuilder to append data read from a file. If you use String you might end up creating multiple string objects in memory.

Wilduck
  • 13,822
  • 10
  • 58
  • 90
Pramod
  • 1
  • 2
  • 5