What is the purpose of the expression "new String(...)" in Java?

Question

While looking at online code samples, I have sometimes come across an assignment of a String constant to a String object via the use of the new operator.

For example:

String s;
...
s = new String("Hello World");

This, of course, compared to

s = "Hello World";

I'm not familiar with this syntax and have no idea what the purpose or effect would be. Since String constants typically get stored in the constant pool and then in whatever representation the JVM has for dealing with String constants, would anything even be allocated on the heap?

Take a look at this blog post. http://kjetilod.blogspot.com/2008/09/string-constructor-considered-useless.html — Ruggs, Dec 24 '08 at 04:01
@Ruggs well, thanks for the link, but it would be nice if you added the disclaimer about the caveats, [as this guy did](http://stackoverflow.com/a/390854/719662). — , Jan 14 '16 at 20:47
https://help.semmle.com/wiki/display/JAVA/Inefficient+String+constructor — Vadzim, Aug 20 '18 at 16:31
Regardless of the answers on how/why `new String(String)` should be used, `s = new String("Hello World")` where the parameter is a literal does not make sense in Java, and probably never will. — Maarten Bodewes, Oct 07 '19 at 12:19

Lawrence Dol · Accepted Answer · 2020-04-15T18:40:08.717

83

The one place where you may think you want new String(String) is to force a distinct copy of the internal character array, as in

small=new String(huge.substring(10,20))

However, this behavior is unfortunately undocumented and implementation dependent.

I have been burned by this when reading large files (some up to 20 MiB) into a String and carving it into lines after the fact. I ended up with all the strings for the lines referencing the char[] consisting of entire file. Unfortunately, that unintentionally kept a reference to the entire array for the few lines I held on to for a longer time than processing the file - I was forced to use new String() to work around it, since processing 20,000 files very quickly consumed huge amounts of RAM.

The only implementation agnostic way to do this is:

small=new String(huge.substring(10,20).toCharArray());

This unfortunately must copy the array twice, once for toCharArray() and once in the String constructor.

There needs to be a documented way to get a new String by copying the chars of an existing one; or the documentation of String(String) needs to be improved to make it more explicit (there is an implication there, but it's rather vague and open to interpretation).

Pitfall of Assuming what the Doc Doesn't State

In response to the comments, which keep coming in, observe what the Apache Harmony implementation of new String() was:

public String(String string) {
    value = string.value;
    offset = string.offset;
    count = string.count;
}

That's right, no copy of the underlying array there. And yet, it still conforms to the (Java 7) String documentation, in that it:

Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.

The salient piece being "copy of the argument string"; it does not say "copy of the argument string and the underlying character array supporting the string".

Be careful that you program to the documentation and not one implementation.

edited Apr 15 '20 at 18:40

answered Dec 24 '08 at 05:57

Lawrence Dol

63,018
25
139
189

"However, this behavior is unfortunately undocumented and implementation dependent." JavaDoc for the `String(String)` constructor says "Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable. " which is a roundabout way of saying said constructor makes an explicit copy of the underying `char[]` of `String` passed to it. – Powerlord Oct 13 '10 at 21:13
7

@R. Bemrose. No, that's what it *could* imply. What it states is that you will get a new copy of the *String* object - it makes no statement about the constituent contents of said object. A new String sharing the underlying array is still a new String and a copy of the old one. Contrast that to `String(char[] value)`, where it explicitly states: The contents of the character array are copied. – Lawrence Dol Oct 14 '10 at 18:16
1

@Monkey I disagree with your interpretation of the Javadoc. We can leverage our boolean equations here: "Unless an explicit copy of original is needed, use of this constructor is unnecessary" translates to `!explicitCopyRequired --> constructor is unnecessary` (--> meaning 'implies') and the rule `a --> b` <==> `!b --> !a` to reveal `constructor is necessary --> explicitCopyRequired`. If you're using the constructor, it's because you need an explicit copy. Now, I agree it could be better worded, but when you break it down, it's clear that this constructor, by contract, makes an explicit copy. – corsiKa Mar 28 '11 at 17:10
2

@Glowcoder - make all the inferences you want, that's *not* what the JavaDoc states. And IIRC at least one major JVM implementation implemented `new String(String)` without making a copy of the underlying character array - it might have been Apache Harmony, but I am not sure. – Lawrence Dol Mar 29 '11 at 01:58
Late to the game, but I believe it is the `substring` method that is not creating a copy (instead it creates a view). The constructor definite creates a copy. – Dunes Nov 05 '12 at 13:13
@Dunes: In short, you need to differentiate between what a method ***does*** and what it's ***documented to do***. Substring may be implemented to only create a view, but it's not documented to only do so, so it could *copy* the array, or part of it, or whatever. Likewise, the constructor is not documented to copy the underlying array so it might and it might not on a given JVM. – Lawrence Dol Nov 06 '12 at 10:58
BTW, one reason I might want to use `new String(other)` but not care whether the underlying array is copied is because I want a different *String* object so I can then elsewhere test `if(someString!=anotherString) {`. There are (rare) cases where that is legitimate. – Lawrence Dol Nov 08 '12 at 21:53
Is `small = new StringBuffer(huge.substring(10, 20)).toString();` consistent or could that reuse the original array? (Note that in the version of the JDK that I had to hand it returns a string whose internal array is 26 characters, but that's still better than 20MB.) – Neil Jan 09 '13 at 15:03
@Neil: What you've written could conceivably result in the final string sharing the same array as the original; nothing in the API contracts prevents that. For example, `StringBuilder/Buffer` could be implemented using copy-on-first-write semantics and could also share the builder array with the new `String` provided that the builder made a copy before any change to the array elements subsequent to a `toString()` invocation. – Lawrence Dol Jan 09 '13 at 23:49
@SoftwareMonkey Ah, older versions of `StringBuffer` did in fact share their buffer with any new strings created (and even after appending) although I see they don't bother these days. – Neil Jan 10 '13 at 23:45
@Neil: Yes, subsequent appending to the char array would be legitimate without making a copy since the `String` would not be impacted due to it's start/end indices. – Lawrence Dol Jan 11 '13 at 00:44
@LawrenceDol: For a `StringBuilder` to legitimately use its backing store to construct a new string, it would have to be able to guarantee that no operation which could modify that backing store could possibly be in progress. While it would be perfectly proper for illegitimate use of a `StringBuilder` by multiple threads to throw an exception or create a string encapsulating an unexpected sequence of characters, it would not be proper to have one thread create a string while the other thread modified it. The cost of ensuring single-thread action would likely be greater than... – supercat Apr 30 '14 at 23:05
...the cost of having `ToString` create a new backing store, though it's possible that re-copy on write could be more efficient if a `StringBuilder` kept track of the last thread to use it and recopied its backing store every time it was accessed from a different thread. – supercat Apr 30 '14 at 23:07
@supercat: That's true about `StringBuilder`, which is effectively the counterpart to `StringBuffer` for a single thread; `StringBuffer` OTOH can make such guarantees and, at least in the past, did indeed share it's backing array with `String`s. None of that, however, changing the essential points I make in my answer. – Lawrence Dol May 03 '14 at 02:24
I probably have to read this answer few more times to understand it completely. – Learner May 21 '14 at 07:05
10

Note that in recent versions of the Oracle JVM substrings **do not share the underlying array** in the way that is described in this answer. `substring` instead copies the data to a new array. This started in Java 1.7 update 6. [See here](http://stackoverflow.com/a/20275133/452775). – Lii Oct 02 '16 at 09:33
@Lii : And that is precisely my point; apps that rely on undocumented behavior are doomed to be borked. – Lawrence Dol Oct 03 '16 at 00:49
@LawrenceDol Sure thing, but as this is now the default way for many versions and since OpenJDK is really the leading implementation nowadays, I would strongly suggest that you include this comment into your answer. – Maarten Bodewes Oct 07 '19 at 12:18
@MaartenBodewes : The comment by Lii is irrelevant to my answer because it pertains to undocumented implementation detail. My answer it specifically that what implementations do is irrelevant, what matters is what the documented contract states. No one should rely of the current undocumented behavior of OpenJDK or any other JVM implementation. – Lawrence Dol Apr 15 '20 at 18:34
Yeah, I agree that it should probably be in the documentation that `new String` makes a copy (or at least doesn't reference a large(r) string). OTOH, it is nice that the OpenJDK at least does this now by default. – Maarten Bodewes Apr 15 '20 at 22:32

score 10 · Answer 2 · answered Dec 24 '08 at 04:28

10

The only time I have found this useful is in declaring lock variables:

private final String lock = new String("Database lock");

....

synchronized(lock)
{
    // do something
}

In this case, debugging tools like Eclipse will show the string when listing what locks a thread currently holds or is waiting for. You have to use "new String", i.e. allocate a new String object, because otherwise a shared string literal could possibly be locked in some other unrelated code.

answered Dec 24 '08 at 04:28

Dave Ray

39,616
7
83
82

1

I think it's better to have use `private static class Lock {}; private final Lock lock = new Lock();`, as the class name shows up pretty much everywhere. However, it will cost you a couple of K, because HotSPot isn't that efficient. – Tom Hawtin - tackline Dec 24 '08 at 12:33
I can buy that. Maybe I'll give it a try next time. – Dave Ray Dec 26 '08 at 01:55
13

Object lock = new Object() does the same at all ;-) – Hardcoded Nov 26 '09 at 13:57
@Hardcoded Right, except for that it's not `Serializable`, so I prefer `new Object[0]`. – maaartinus Jun 18 '15 at 12:48
3

@maaartinus What's the point of serializing sync-monitors? Sync won't work between the instances, so it's pointless. In contrast, my monitors are always final. – Hardcoded Jul 09 '15 at 12:38
@Hardcoded Sure, you can (and should) make it transient and get a slightly smaller serialized form. – maaartinus Jul 09 '15 at 15:37

Vikas · Answer 3 · 2014-01-18T20:01:39.013

6

String s1="foo"; literal will go in StringPool and s1 will refer.

String s2="foo"; this time it will check "foo" literal is already available in StringPool or not as now it exist so s2 will refer the same literal.

String s3=new String("foo"); "foo" literal will be created in StringPool first then through string arg constructor String Object will be created i.e "foo" in the heap due to object creation through new operator then s3 will refer it.

String s4=new String("foo"); same as s3

so System.out.println(s1==s2); //true due to literal comparison.

and System.out.println(s3==s4);// false due to object comparison(s3 and s4 is created at different places in heap)

edited Jan 18 '14 at 20:01

answered Jan 18 '14 at 19:28

Vikas

490
4
10

5

Not sure why this is upvoted... It doesn't answer the question in any way. – Christopher Schneider May 19 '17 at 17:06
I found it helpful because it answered a question I couldn't find, which was: "Do strings created with the constructor skip interning" – beartech1 Feb 21 '19 at 19:31

score 4 · Answer 4 · answered Sep 11 '13 at 12:43

4

The sole utility for this constructor described by Software Monkey and Ruggs seems to have disappeared from JDK7. There is no longer an offset field in class String, and substring always use

Arrays.copyOfRange(char[] original, int from, int to)

to trim the char array for the copy.

answered Sep 11 '13 at 12:43

MasterKiller

84
5

2

Until, and unless it's documented to do so, you are still relying on an implementation side-effect, not a documented action. Just because JDK7 happens to copy the array in `substring()` does not mean it must in every implementation (and OpenJDK is not the only JVM implementation out there). – Lawrence Dol Nov 13 '13 at 20:36
Indeed, but the problem described here IS a side-effect of the particular implementation, not something that is specified or documented anywhere. So it is relevant. – MasterKiller Nov 28 '13 at 16:11

score 2 · Answer 5 · edited Nov 26 '09 at 12:07

Well, that depends on what the "..." is in the example. If it's a StringBuffer, for example, or a byte array, or something, you'll get a String constructed from the data you're passing.

But if it's just another String, as in new String("Hello World!"), then it should be replaced by simply "Hello World!", in all cases. Strings are immutable, so cloning one serves no purpose -- it's just more verbose and less efficient to create a new String object just to serve as a duplicate of an existing String (whether it be a literal or another String variable you already have).

In fact, Effective Java (which I highly recommend) uses exactly this as one of its examples of "Avoid creating unnecessary objects":

As an extreme example of what not to do, consider this statement:

String s = new String("stringette");  **//DON'T DO THIS!**

(Effective Java, Second Edition)

Just because you use the overload containing a string doesn't mean it's pointless - see Software Monkey's answer. — Jon Skeet, Dec 24 '08 at 07:44

Tashkhisi · Answer 6 · 2020-08-30T20:21:38.357

Here is a quote from the book Effective Java Third Edition (Item 17: Minimize Mutability):

A consequence of the fact that immutable objects can be shared freely is that you never have to make defensive copies of them (Item 50). In fact, you never have to make any copies at all because the copies would be forever equivalent to the originals. Therefore, you need not and should not provide a clone method or copy constructor (Item 13) on an immutable class. This was not well understood in the early days of the Java platform, so the String class does have a copy constructor, but it should rarely, if ever, be used.

So It was a wrong decision by Java, since String class is immutable they should not have provided copy constructor for this class, in cases you want to do costly operation on immutable classes, you can use public mutable companion classes which are StringBuilder and StringBuffer in case of String.

score -1 · Answer 7 · answered May 15 '13 at 11:58

There are two ways in which Strings can be created in Java. Following are the examples for both the ways: 1) Declare a variable of type String(a class in Java) and assign it to a value which should be put between double quotes. This will create a string in the string pool area of memory. eg: String str = "JAVA";

2)Use the constructor of String class and pass a string(within double quotes) as an argument. eg: String s = new String("JAVA"); This will create a new string JAVA in the main memory and also in the string pool if this string is not already present in string pool.

Then there's `substring`, `trim`, `replace`, `replaceAll`, `StringBuilder`, `StringBuffer`, etc. — Lawrence Dol, Nov 13 '13 at 20:38

score -1 · Answer 8 · answered Dec 24 '08 at 04:16

Generally, this indicates someone who isn't comfortable with the new-fashioned C++ style of declaring when initialized.

Back in the C days, it wasn't considered good form to define auto variables in an inner scope; C++ eliminated the parser restriction, and Java extended that.

So you see code that has

int q;
for(q=0;q<MAX;q++){
    String s;
    int ix;
    // other stuff
    s = new String("Hello, there!");
    // do something with s
}

In the extreme case, all the declarations may be at the top of a function, and not in enclosed scopes like the for loop here.

IN general, though, the effect of this is to cause a String ctor to be called once, and the resulting String thrown away. (The desire to avoid this is just what led Stroustrup to allow declarations anywhere in the code.) So you are correct that it's unnecessary and bad style at best, and possibly actually bad.

Either I completely don't understand this, or it's not related to the actual question. The question is about the difference between ...="Hello" and ...=new String("Hello"). You seem to be talking about the difference between "String s=..." and "String s; ...; s=...". — Dave Costa, Dec 24 '08 at 14:10
And I have to heartily disagree with "back in the C days, it wasn't considered good form to define auto variables in an inner scope", at least since the C days of 1991; in my experience it has always been considered best to declare variables in C with as narrow a scope as the compiler would allow, which, again since 1991, has been at least at the top of any block. — Lawrence Dol, Nov 29 '13 at 18:40

score -2 · Answer 9 · answered Dec 24 '08 at 04:19

I guess it will depend on the code samples you're seeing.

Most of the times using the class constructor "new String()" in code sample are only to show a very well know java class instead of creating a new one.

You should avoid using it most of the times. Not only because string literals are interned but mainly because string are inmutable. It doesn't make sense have two copies that represent the same object.

While the article mensioned by Ruggs is "interesting" it should not be used unless very specific circumstances, because it could create more damage than good. You'll be coding to an implementation rather than an specification and the same code could not run the same for instance in JRockit, IBM VM, or other.

It makes sense to have two copies if you want to throw away one of them, and it's a lot bigger than the other... — Jon Skeet, Dec 24 '08 at 07:45
If one is bigger than the other, they would be two different objects in first place isn't? :-/ — OscarRyz, Dec 26 '08 at 17:19

What is the purpose of the expression "new String(...)" in Java?

9 Answers9

Pitfall of Assuming what the Doc Doesn't State

Linked

Related