Java high amount of char[], how to reduce?

Question

I believe this garbage is created when I call new String in various places throughout my application. How can I "create" a string without making a new object each time?

The reason for being this garbage-sensitive is because my application cannot create garbage as we need to run close to real-time with the default Java GC.

// you can see I use the same chars array
public String getB37String() {
    long l = getLong();
    int i = 0;
    while (l != 0L) {
        long l1 = l;
        l /= 37L;
        chars[11 - i++] = validChars[(int) (l1 - l * 37L)];
    }
    return new String(chars, 12 - i, i);
}

And for example using StringBuilder.toString() which uses new String underneath.

// and you can see that I use the same builder
public String getString() {
    builder.delete(0, builder.length());
    char ascii;
    while (0 != (ascii = (char) getUByte()) && backing.hasRemaining())
        builder.append(ascii);
    return builder.toString();
}

well, if you need to use the same string content for example "java", if you use String = "java", the first time it will be created a new object, and afterwards, no new objects are created, they are fetched from string pool. If you use new each time a new object is created regardless if it already exists in the pool, so either way, new string contents will always be created each time as new memory allocated — GingerHead, Jul 11 '15 at 23:16
I'm still not clear on what is the motivating force behind your question. Are you experiencing performance problems? Your title suggests that you may have too many `char` arrays. How is avoiding creating new strings going to help you with that? This feels like the infamous [XY problem](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem). Can you clarify? — sstan, Jul 11 '15 at 23:18
@sstan These `char` arrays are the ones that are backing the many strings. I need zero garbage in my application and using `new String` and `StringBuilder.toString` both create garbage. I am wondering if there are any tricks to create a string into the literal pool (or some other solution to prevent garbage)! — Jire, Jul 11 '15 at 23:20
@Jire: My point is that even if you are able to do such a trick, what about the char array that you are currently passing to the `String` constructor, or the char array that backs the `StringBuilder`? Aren't those garbage as well? But don't you need those for whatever you are doing? Are you really sure that avoiding this `String` instance creation will really solve whatever your problem is? — sstan, Jul 11 '15 at 23:24
@sstan If you notice the methods don't make new `char` array or `StringBuilder` but rather I reuse the objects. _GingerHead_ seems to have a solution below! — Jire, Jul 11 '15 at 23:29
@Jire: As I don't think that the `intern()` idea will work for what you are trying to accomplish, I think the key is understanding how your methods `getB37String()` and `getString()` are being called and used. Do you absolutely need to return strings? Can you perform some streaming instead? Otherwise, I don't see how you'll manage to avoid creating objects that will need to be collected at some point. Wouldn't you benefit more from finding ways to tune how the GC performs the collection? (How often, how long, etc...) — sstan, Jul 12 '15 at 00:24
@Jire, "creating a string into the literal pool" won't magically prevent garbage. It still has to be created. And it will still be garbage-collected when it's not used anymore. String literals aren't magical. They're just part of the interning pool. Unless you mean to pre-allocate all possible strings you'll ever need. — the8472, Jul 12 '15 at 02:03

Stephen C · Answer 1 · 2015-07-13T07:59:12.270

First an observation:

The reason for being this garbage-sensitive is because my application cannot create garbage as we need to run close to real-time with the default Java GC.

If that ("cannot create garbage") is actually a true statement¹, then you may well have started in the wrong place by picking Java as your implementation language.

Java is designed on the assumption that generation of garbage is OK. It is the "cost" of avoiding the inherent complexity (and consequent bugs) of doing explicit memory management This assumption pervades the language design and the standard library design.

The other thing about Java that is not "in your favour" is that it strongly supports good OO design principles. In particular, with few exceptions, the the APIs provide strong abstraction and are designed to prevent traps where an applications could accidentally break things.

For example, when you do this:

  char[] c = new char[]{'a', 'b', 'c'};
  ...
  String s = new String(c);

the String constructor allocates a new char[] and copies to the characters in c to it. Why? Because if it didn't, you would have a "leaky abstraction". Someone could do this:

  char[] c = new char[]{'a', 'b', 'c'};
  ...
  String s = new String(c);
  ...
  c[0] = 'd';

and the leaky abstraction has resulted in a change to a (supposedly) immutable object.

So what is "the solution"?

You could rewrite your application in C or C++ or some other programming language where you can have complete control over memory allocation. (Of course, that is a lot of work ... and there may be other reasons why you can't do this.)
You could redesign the relevant parts of your application so that they don't use String or StringBuilder or any of the standard Java classes that involve explicit or implicit (under the hood) heap allocation. It is not impossible, but it is a lot of work. For example, many standard and third-party APIs expect you to pass them String objects as parameters.
You could analyse the parts of your code that do string operations to do it "smarter" in order to reduce allocation of garbage.

Unfortunately, all of these things are likely to make your code-base larger, harder to read, harder to debug and harder to maintain.

^{1 - One case where it might not be true is if the problem you are really trying to solve is GC pauses. There are ways to address GC pauses that don't go as far as not creating any garbage. For example, picking a low-pause parallel GC, and reducing the size of the young generation space, could give you pauses that are short enough to not be noticeable. Another trick is to force a GC at points when you know that the user won't notice; e.g. when loading a new level in a game.}

score 2 · Answer 2 · edited May 23 '17 at 11:58

2

Difference Between Both

The reference is here.

They both are the same, they are like any other object but:

Since String is one of the most used type in any application, Java designer took a step further to optimize uses of this class. That's why they come up with an idea to cache all String instances created inside double quotes e.g. "Java". These double quoted literal is known as String literal and the cache which stored these String instances are known as as String pool.

At high level both are String object, but main difference comes from the point that new() operator always creates a new String object. Also when you create String using literal they are interned.
String a = "Java";
String b = "Java";
System.out.println(a == b);  // True
Here two different objects are created and they have different references:
String c = new String("Java");
String d = new String("Java");
System.out.println(c == d);  // False
Similarly when you compare a String literal with an String object created using new() operator using == operator, it will return false, as shown below :
String e = "JDK";
String f =  new String("JDK");
System.out.println(e == f);  // False

Garbage Collectors

The reference is here.

In fact the String objects that correspond to String literals typically are not candidates for garbage collection. This is because there is an implicit reference to the string object in the code of every method that uses the literal. This means that the String is reachable for as long as the method could be executed.

However, this is not always the case. If the literal was defined in a class that was dynamically loaded (e.g. using Class.forName(...)), then it is possible to arrange that the class is unloaded. If that happens, then the String object for the literal will be unreachable, and will be reclaimed when the heap containing the interned String gets GC'ed.

String Pool

The refrence is here.

java.lang.String.intern() returns an interned String, that is, one that has an entry in the global String pool. If the String is not already in the global String pool, then it will be added.

Programmatically you can follow this approach:

It follows that for any two strings s and t, s.intern() == t.intern() is true if and only if s.equals(t) is true.

So if you use intern() on a String:

By calling String.intern()

Then:

Is guaranteed to be from a pool of unique strings.

edited May 23 '17 at 11:58

Community

1
1

answered Jul 11 '15 at 23:07

GingerHead

8,130
15
59
93

I know that string literals are cached, I am wondering if I can somehow achieve the same behavior through `new String` or another way. – Jire Jul 11 '15 at 23:09
as you can read on, when you use new, java runtime will not search in the pool anymore, it will directly allocate a new place for it in memory, so it's faster to create it. – GingerHead Jul 11 '15 at 23:12
I know this. I want to search in the pool! Is there a way to access it? – Jire Jul 11 '15 at 23:14
`intern` looks like what I need! What is the use of `StringConstantPool` by the way? Wouldn't `intern` use the pool inside `String`? – Jire Jul 11 '15 at 23:31
You are correct, it was just an example to demonstrate how it is used by java as the constantPool variable was to represent the string pool. Look at my edit. – GingerHead Jul 11 '15 at 23:39
1

So, how does using `intern()` avoid creating a `String` instance? Don't you have to create the string first to then be able to call `intern()` and fetch the equivalent string from the pool? – sstan Jul 11 '15 at 23:42
@sstan This is exactly what I'm thinking. I'm unsure how it would work if I have to use `new String` first as in my first example `new String(chars, 12 - i, i).intern();` – Jire Jul 11 '15 at 23:46
well, if you apply intern, you can make sure that your value in hand is from the pool. so in the pool there should be only one string with this value, so if you try next time to declare a string with the same value, then you would be on the safe side as you could be sure that there is no new string created. – GingerHead Jul 11 '15 at 23:52
@GingerHead I don't understand, could you show an example? – Jire Jul 11 '15 at 23:57
look at this: http://examples.javacodegeeks.com/core-java/lang/string/java-string-intern-example/ – GingerHead Jul 11 '15 at 23:58
@GingerHead So even though I use `new String` it will not create another instance if I `intern`? – Jire Jul 12 '15 at 00:34
@GingerHead Then how will this help? Effectively I want to pool without using `new`, but I know of no way to create a string without `new`. – Jire Jul 12 '15 at 01:22
-1, because this answer plagiarizes content from both the linked blog post and other Stack Overflow answers. The section on garbage collection is copied from [here](http://stackoverflow.com/a/18407081/1247781), and parts of the string pool section are plagiarized from [here](http://stackoverflow.com/a/19049928/1247781) and the [`intern()` documentation](http://docs.oracle.com/javase/8/docs/api/java/lang/String.html#intern--). – FThompson Jul 12 '15 at 02:04
@Vulcan The urls are declared in the answer like in to read more, feel free to add all the refrences in the answer, I didnt have time to add them all, I had added only one. – GingerHead Jul 12 '15 at 02:05
@GingerHead It's still plagiarism unless you make it clear that you're quoting another source; this is why the block quote markup exists. Furthermore, you left no link to the original content on either of the other two sections; this is clearly and undoubtedly plagiarism, and has no place at SO. You should edit your answer to thoroughly indicate what you've quoted from elsewhere (via blockquote markup) and properly link to these sources. – FThompson Jul 12 '15 at 02:08
It's unfair to the original authors to simply copy-paste from other sources without making it clear that you aren't the original author of the text. – FThompson Jul 12 '15 at 02:10
@Vulcan I know this exactly, and as I said before I didnt have time, I refrenced only one, and since you know all the references please feel free to add them. – GingerHead Jul 12 '15 at 02:12
@GingerHead If I was confident I'd found every source you plagiarized, I'd edit the citations and blockquotes in myself, but because I'm unsure of what exactly you've copied from where (the three sources I did find didn't totally cover your answer), I'm not able to confidently fix your answer. – FThompson Jul 12 '15 at 02:13
@GingerHead You really should use blockquotes, because those are indeed quoted from other sources. The block quote is the proper way to cite another source verbatim on Stack Overflow. – FThompson Jul 12 '15 at 02:21
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/83055/discussion-between-gingerhead-and-vulcan). – GingerHead Jul 12 '15 at 02:21
@Vulcan could we continue in chat – GingerHead Jul 12 '15 at 02:27
2

@GingerHead I don't believe chat is necessary here, and I'm about to leave my desk anyway. You can refer to [this great resource](http://meta.stackexchange.com/a/160078/259611) if you have any questions regarding what exactly is considered plagiarism, or how to properly add attribution. Thanks for being helpful in fixing up your answer :) – FThompson Jul 12 '15 at 02:30

the8472 · Answer 3 · 2015-07-12T02:23:29.603

2

If you're using Java8u20 or newer you can try using -XX:+UseG1GC -XX:+UseStringDeduplication to enable string deduplication.

While this won't avoid the creation of garbage it might reduce memory pressure.

If you really want to create String instances without the copying cost of the char[] array you will have to access the package-private constructor java.lang.String.String(char[], boolean) or the private char[] value field via reflection, with the appropriate runtime checks/error reporting whether it actually works.

I wouldn't recommend it, but it's an option.

Another option is to stop using Strings and work with ByteBuffer. You can slice them as needed, return views, return read-only views, recycle them.

And they're also more compact if you work with utf-8 data. The downside is that you can't use APIs that require Strings.

Or just deal in CharSequence/StringBuilder/Charbuffer objects in as many places as you can.

Depending on use-cases you can also create a string cache for your computation. Map<T, String> where T is the input parameter of your computation. This way you will only ever need 1 String for each possible value of T.

return new String(chars, 12 - i, i);

Note that as of Java 8 strings do not store an internal offset, i.e. String objects are not a "view" on some potentially larger backing char array.

This used to be different in the past, but since it was an implementation detail it got changed.

It might be possible to undo that change with a custom String class added via the bootstrap classloader, but that's more likely to break or cause severe performance degradation than not.

as we need to run close to real-time with the default Java GC.

This may be your actual problem.

None of collectors configured by default provide you anything that comes even close to realtime behavior. CMS or G1 can provide much lower pause times, especially on large heaps, than either the Serial or ParallelOld collectors.

edited Jul 12 '15 at 02:23

answered Jul 12 '15 at 01:38

the8472

40,999
5
70
122

1

I'm pretty sure that string de-duplication won't help. The OP's problem is the amount of garbage being generated. Deduplication is about reducing the space needed to represent non-garbage strings. – Stephen C Jul 12 '15 at 02:04
1

@StephenC He *claims* the amount of garbage generated is the problem. I'm not certain that this actually is the case. Generating more garbage causes the GC to run more often, it does not necessarily increase pause times as long as objects are short-lived enough. And I mention various other options anyway. – the8472 Jul 12 '15 at 02:11
Well yes. You do suggest other options. But I'm suggesting that the first option you present is unlikely to help. It is not a good idea to write Answers like that ... – Stephen C Jul 12 '15 at 02:39
1

@StephenC. The questions is called *"Java high amount of char[], how to reduce?"*. My first suggestion does that, reducing the amount of char arrays on the heap. ¯\\_(ツ)_/¯. Honestly I have difficulty extracting what he actually wants. So i'm just throwing options out there. Something might stick. – the8472 Jul 12 '15 at 02:50
I know that. But you are not "getting" what I am saying to you. – Stephen C Jul 12 '15 at 02:52
@the8472 Sorry for not commenting this earlier but I wanted to say that I enjoyed your answer a lot. I ended up using your advice to directly access the constructor. This reflective action requires more demand on the CPU so I ended up making my own pool by hash and thus my own garbageless hash map (since Java's produces garbage). I have since moved away from the reflection approach to instead simply iterate through each character and use concatenation which is autointerned. I kept the same pooling. :) – Jire Mar 02 '16 at 10:03
1

@Jire, you can turn reflected methods into methodhandles and invoke them via the lambdametafactory, that should avoid the costs of reflection. – the8472 Mar 02 '16 at 13:36

Java high amount of char[], how to reduce?

3 Answers3

Difference Between Both

Garbage Collectors

String Pool