0

Java String is immutable so

when you create a string, a block of memory is assigned for it in the heap, and when you change its value, a new block of memory is created for that string, and the old one becomes eligible for garbage collection, for example

String str = func1_return_big_string_1()"; //not literal
String str= func2_return_big_string_2()"; //not literal

But as garbage collection takes time to kick in so we are practically have memory in heap containing both big string 1 & 2. They can be a issue for me if this happens a lot.

Is there a way to make big string 2 to use the same location in memory of string 1 so we don't need have extra space when we assign big string 2 to str.

Edit: Thanks for all the input and in the end I realized I shouldn't expecting java code to behave like c++ code(i.e, different memory footprint). I have wrote a c++ 11 demo which works as expected, biggest memory footprint is around 20M (biggest file I was trying to load) with rvalue reference and move assignment operator all kick in as expected. Below demo done in VS2012 with c++ 11.

#include "stdafx.h"
#include <iostream>
#include <string>
#include <vector>
#include <fstream>
#include <thread>
using namespace std;

string readFile(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

class test{
public:
    string m_content;
};

int _tmain(int argc, _TCHAR* argv[])
{
    string base("c:\\data");
    string ext(".bin");
    string filename;
    test t;
    //std::this_thread::sleep_for(std::chrono::milliseconds(5000));
    cout << "about to start" << endl;
    for(int i=0; i<=50; ++i) {
        cout << i << endl;
        filename = base + std::to_string(i) + ext;
        //rvalue reference & move assignment operator here
        //so no unnecessary copy at all
        t.m_content = readFile(filename);
        cout << "szie of content" << t.m_content.length() << endl;
    }
    cout << "end" << endl;
    system("pause");
    return 0;
}
RoundPi
  • 5,819
  • 7
  • 49
  • 75
  • These aren't string *literals* we're talking about, right? Presumably the big strings are returned from some method, correct? – arshajii Oct 20 '13 at 16:18
  • 1
    Use a Char array instead? Looking through the API, I only see String.replace() as a likely candidate, but that requires you to use Regex on it, and I dont know how the memory allocation works for that – EyeOfTheHawks Oct 20 '13 at 16:19
  • @arshajii: yes thats right – RoundPi Oct 20 '13 at 16:21
  • 3
    Secondly, have you done any tests to determine that there is really a memory issue here? The GC should be pretty efficient when it comes to this. – arshajii Oct 20 '13 at 16:23
  • No but I can foreseeen this could happen, at least I don't want to reply on GC if I don't have to. – RoundPi Oct 20 '13 at 16:26
  • How big are the strings? Are we talking thousands, millions, or billions of characters here? – meriton Oct 20 '13 at 16:40
  • +1 @arshajii. Unless you're doing a _lot_ of work, the GC is smarter than you, and if you _are_ doing a lot of work, you should be using a `StringBuilder`. – Louis Wasserman Oct 20 '13 at 22:12
  • @LouisWasserman: Can StringBuilder been use to assign , I don't need appending ? – RoundPi Oct 20 '13 at 23:30
  • @Gob00st: Depends what you mean by "assign," but probably. Have you read its docs? – Louis Wasserman Oct 20 '13 at 23:55
  • @LouisWasserman: its typically use a lot append() which is not I want. – RoundPi Oct 21 '13 at 00:17
  • @Gob00st: _How_ do you want to modify the string? If you just want to replace the value of the first string, trust that the GC is smarter and faster than anything you could do. – Louis Wasserman Oct 21 '13 at 01:04
  • @Gob00st: Take a look at my answer. Everybody downvotes, but it is the only correct answer here. – Martijn Courteaux Oct 21 '13 at 20:34

5 Answers5

2

Use StringBuffer, StringBuffer.append()

  • 1
    I think he wants to completely reassign `str` to a new string, not somehow mutate a preexisting one. – arshajii Oct 20 '13 at 16:24
  • Is it possible in Java? Java's Strings are immutable. The only option to make them mutable is to use (like @Praveen suggested) StringBuffer or StringBuilder class. – Ernestas Kardzys Oct 20 '13 at 16:28
2

I see several options:

  1. Use char[].
  2. Copy StringBuilder into your version MyStringBuilder with a public reusable buffer. The major disadvantage is that it lacks regexes. That's what I did when I needed to boost performance.
  3. Hack for JDK <=6: there is a protected constructor to reuse strings/wrap char buffers. It's not there anymore for JDK 7+. One needs to be really cautious with this, and it's not a problem once you have C/C++ background.
  4. Copy String into the MutableString with a public reusable buffer. I don't think there would a problem adding your custom regex matcher as there are a plenty of them available.
Andrey Chaschev
  • 16,160
  • 5
  • 51
  • 68
  • I don't need to worry about regexes for this particular case, so could you give a example of your 2> solution? – RoundPi Oct 20 '13 at 23:34
  • Here is a [sample](https://github.com/chaschev/chutils/blob/22be4d0d1a872733b115c7d5995c552a3a6ca9f4/src/main/java/chaschev/lang/OpenStringBuilder.java). `reset()` method was added and a few others to reuse the `StringBuilder`. – Andrey Chaschev Oct 21 '13 at 09:16
1

It shouldn't really matter for non-interned Strings. If you start running out of memory, the garbage collector will remove any objects that are no longer referenced.

Interned Strings are much harder to collect, see Garbage collection of String literals for details

EDIT A non-interned String is just like a normal object. Once there are no more references to it, it will get garbage collected.

if str is the only reference left pointing to the original String and str is changed to point to something else, then the original String is eligible for garbage collection. So you no longer have to worry about running out of memory because the JVM will collect it if memory is required.

Community
  • 1
  • 1
dkatzel
  • 31,188
  • 3
  • 63
  • 67
1

In order to avoid having both the old and new String at the same time in memory, you can explicitly allow the GC to clean it up by assigning null to the variable:

String str;
str = func1_return_big_string_1();
str = null; // Now, GC can clean, when it needs extra memory for the String.
str = func2_return_big_string_2();

UPDATE: To support my claim, I wrote a test case that proves I'm right: http://ideone.com/BwGfSN. The code demonstrates the difference between (using the Finalizer):

GCTest test;
// Without the null assignment
test = create(0);
test = create(1);
test = null;
System.gc();

try {Thread.sleep(10);} catch (Exception e){}
System.out.println();

// With the null assignment
test = create(2);
test = null;
test = create(3);
test = null;
System.gc();
Martijn Courteaux
  • 67,591
  • 47
  • 198
  • 287
  • Yes, you can. By calling `System.gc()`. But that is not what I'm showing here. The idea is to release the previous String object, in order to allow the GC to kick in while creating the string within `func2`. – Martijn Courteaux Oct 21 '13 at 05:36
  • Funny to see how everybody is downvoting, while this is really the technical answer to his problem. – Martijn Courteaux Oct 21 '13 at 11:34
  • May be they downvote because JVM should do this `null` assignment for user? JIT can understand that `str` is not referenced anymore in the current evaluation scope and can be GC-freed. – Andrey Chaschev Oct 21 '13 at 13:06
  • 1
    @AndreyChaschev: No, that is not true. While in the `func2_return_big_string_2()` method, the reference to the other big string (stored in `str`) still exists. Only when the method returns, the value of `str` is overwritten. So, while creating the next big string, the GC can't free the previous one. Therefor, assigning it to `null` allows the GC to get rid of it, when `func2_return_big_string_2()` is allocating the next one. – Martijn Courteaux Oct 21 '13 at 17:55
  • There is also a JIT which can optimize the code so it looks as it is stated in your answer. It's a correct optimization. I've re-read the original question and, yes, it's mistakenly stated that two string would be on heap. My guess is that there will be only one after the JIT optimization, it really kicks it and code sample is quite trivial. What it can not optimize however is memory allocation needed in each of the methods. – Andrey Chaschev Oct 21 '13 at 20:12
  • The JIT does this optimization NOT. At least not my JVM. Carefully look at this code: http://ideone.com/BwGfSN. – Martijn Courteaux Oct 21 '13 at 20:24
  • In this case it really does not. Even after I added `final`s and put it into the loop. And this is really hard to explain. Anyway, this is something new for me, upvoting :-) – Andrey Chaschev Oct 21 '13 at 22:45
  • @MartijnCourteaux: I have try nulling the String without calling System.gc() and it did help I guess GC kicked in for the null string ? Shouldn't GC kick in even without the explicitly nulling the String as it's no longer referenced ? – RoundPi Oct 22 '13 at 15:21
  • @Gob00st: You are right. The GC will kick in without calling System.gc(). This is because he will detect the JVM is running out of memory. The difference is that GC cannot clean up the old String as long as the second method is running, because while the second method is running, the reference to the old String still exists. Only when the method returns and overwrites the str reference, the old String is not longer referenced. So, if you nullify the reference before entering the second method, GC can free the memory of the old String while the second method is building the next big String. – Martijn Courteaux Oct 22 '13 at 15:28
  • @MartijnCourteaux: this one so far seems saved the most memory. But even with all the nulling and System.gc(), it still doesn't seems to get rid of all the null (none referenced string). So basically instead of having around 20M memory usage steady (if I write this in c++ 11), it spikes up to 600M then reduced to around 180M with your method (without any extra copy, it should be max 20M for my java test app). I guess this is Java not c++, bumpy ride instead of smooth one :) – RoundPi Oct 22 '13 at 17:02
  • @Martijn Courteaux: I guess, in general, you are be right, but note that the JIT doesn't ever get invoked in your test case. It only kicks in if there's some significant time spent in the code. I could imagine that it could eliminate your null asignment. See a [related question of mine](http://stackoverflow.com/questions/19547284/can-jit-do-this-field-access-optimization). – maaartinus Oct 23 '13 at 17:07
  • @maaartinus: I think you simply don't really know what JIT is (was). JIT is the compiler that compiles the bytecode into native machinecode. It is not "the optimizer". It can do optimizations, because it's the compiler. – Martijn Courteaux Jan 29 '14 at 22:20
  • 1
    @MartijnCourteaux: Could we agree that JIT is the *optimizing compiler*? Could we also agree that JVM can run without it? If not, please take a look [here](http://www.herongyang.com/JVM/Micro-Benchmark-Interpreted-Only-Mode.html). The first few iterations are always interpreted, agreed? – maaartinus Jan 30 '14 at 03:03
1

I have just found a MutableString implementation. It is available in Maven Central. Here is an extract from their JavaDoc page:

  • Mutable strings occupy little space— their only attributes are a backing character array and an integer;
  • their methods try to be as efficient as possible: for instance, if some limitation on a parameter is implied by limitation on array access, we do not check it explicitly, and Bloom filters are used to speed up multi-character substitutions;
  • they let you access directly the backing array (at your own risk);
  • they implement CharSequence, so, for instance, you can match or split a mutable string against a regular expression using the standard Java API;
  • they implement Appendable, so they can be used with Formatter and similar classes;

UPDATE

You can utilize Appendable interface of this MutableString to read a file with almost zero-memory overhead (8KB, which is the default buffer size in Java). With Guava's CharStreams.copy it looks like this:

MutableString str = new MutableString((int) file.length());
CharStreams.copy(Files.newReaderSupplier(file, Charset.defaultCharset()), str);
System.out.println(str);

Full working example.

Andrey Chaschev
  • 16,160
  • 5
  • 51
  • 68
  • Great! Accepting an answer motivates our future contributions! :-) – Andrey Chaschev Oct 22 '13 at 16:26
  • Though it's not saving much because when the mutable string does the append, it's copying the char array loaded from file into it's own array. I was hoping it can do move semantic with rvalue reference in c++ 11: meaning pointing it's array point to the temp object's array(loaded from file), but its not doing the smart thing... Anything upvote for you. Cheers – RoundPi Oct 22 '13 at 16:57
  • I've updated an answer with an example of buffered reading into this `MutableString`. HTH – Andrey Chaschev Oct 22 '13 at 17:18
  • The code won't compile: The generic method copy(InputSupplier, OutputSupplier) of type CharStreams is not applicable for the arguments (InputSupplier, MutableString). The inferred type FileInputStream is not a valid substitute for the bounded parameter – RoundPi Oct 22 '13 at 17:47
  • My bad, this one is for memory buffers. I've updated & tested it for files. [Also available as Gist](https://gist.github.com/chaschev/7106875). – Andrey Chaschev Oct 22 '13 at 19:45