6

I see four alternatives for converting a char to a Stirng in Java.

v = Something.lookup(new String((char)binaryData[idx])); // SORRY! Wrong.
v = Something.lookup("" + (char)binaryData[idx]);
v = Something.lookup(String.valueOf((char)binaryData[idx]));
v = Something.lookup(Character.toString((char)binaryData[idx])));

I think that the first is the slowest. The second is very convenient. I speculate that the third may return a previously created String instance, but I'm not sure, and the API documentation does not say so. The same applies to option four. This reusing of instance would be very fortunate because then a hash based lookup could take advantage of hashCode() caching in String. (Which feature is also not described in the API documentation, but many people told me.)

I'm coming from C++ and I feel this lack of complexity informations disturbing. :-) Are my speculations correct? Do we have any kind of official documentation where performance guaranties and the caching mechanisms declared?

Notinlist
  • 16,144
  • 10
  • 57
  • 99
  • 6
    "The second is very convenient" - not if you're trying to read the code, IMO. The code uses string concatenation and an empty string, neither of which are part of what you really want to achieve. `String.valueOf` all the way, IMO - and have you *measured* any of these? Do you know that this is really the bottleneck in your code anyway? Have you tried to determine whether `String.valueof` *does* cache the values? (It's easy to tell...) Assuming `binaryData` is a `byte[]`, you could easily construct your own `String[]` with 256 strings in anyway, to absolutely guarantee caching... – Jon Skeet Mar 25 '15 at 10:28
  • 3 and 4 are the same.. 4 implictly invokes 3 :) . Also we don't have a String constructor which takes a single char (case-1?) – TheLostMind Mar 25 '15 at 10:30
  • Version 2 will be translated by the compiler into: the creation of a StringBuilder object, to which your char is then appended, finally toString() will be called. So that is quite expensive. For the cost of v3 or v4 ... you could have a look into the source and see what happens there. – GhostCat Mar 25 '15 at 10:32
  • 1
    And there is one other thing to keep in mind: where is that binary data coming from? Are you sure that it really represents Java "char" values? There is no need to worry about encoding? – GhostCat Mar 25 '15 at 10:33
  • 1
    Since you aren't converting a character to string but a byte, and you do it without care for encoding and multi-byte characters, all these options are wrong. Speed is the last thing you need to worry about when your solution is incorrect in the first place. – biziclop Mar 25 '15 at 10:42
  • 1
    Just ask yourself this: in the past how many times has the character-string conversion proved a bottleneck in your application performance? And how many times have you bumped into "mysterious" encoding issues? That should give you the relative importance of speed and getting it right. But if it really was a character you were converting, anything apart from option 1 and 2 is fine. Creating new string instances with `new String()` is rarely a good idea and option 2 is a hideous misuse of implicit conversions. – biziclop Mar 25 '15 at 10:44
  • ASCII 7 bit data. @JonSkeet Nice workaround. Not measured, but these things are in a ≈1000 iteration long loop that runs about a billion times (Many terabytes of data). Not much else happens there, but these kind of things. – Notinlist Mar 25 '15 at 12:09
  • 1
    @Notinlist So you are very much in the high performance area then. I would definitely go with a table of all possible strings to avoid creating billions of objects. In fact I'd probably try to avoid strings and characters altogether, if possible. – biziclop Mar 25 '15 at 12:33

4 Answers4

9

First of all, the Java specification does not say anything about performance concerning these four methods, hence the results may vary depending on the JRE version and vendor you use.

If you use Oracle's JRE, you can easily inspect the source code by yourself! In Java 8, it is as follows:

Given a char c with some value:

  • new String(c) doesn't compile. No such constructor.
  • "" + c looks ugly, cumbersome and tricky. Internally it creates a new empty StringBuilder and appends the character to it. Then it creates a new String instance out of the StringBuilder.
  • Character.toString(c) delegates to String.valueOf(c).
  • String.valueOf(c) creates a new String instance.

So which one to use?

The most readable!

That's String.valueOf(c) or Character.toString(c) in my point of view!

isnot2bad
  • 24,105
  • 2
  • 29
  • 50
  • 1
    You migh want to discuss string intern just to make the answer more complete. I bet all lower 256 acii chars are interned (cached) on boot up. – Adam Gent Mar 25 '15 at 11:36
  • 1
    `"" + c` is a little bit faster due to a special optimization in HotSpot JVM, but I would also prefer `Character.toString` or `String.valueOf` for the sake of readability. – apangin Mar 25 '15 at 11:40
  • @AdamGent no, suprisingly `String.valueOf(char)` does not use any cache at all. It just creates a new instance of `String` passing the given character as one-element-char-array. `String#intern()` doesn't have anything to do with converting chars to strings, so I think this topic does not really fit into my answer. – isnot2bad Mar 25 '15 at 12:20
  • I will use `String.valueOf(c)` for now. Later I will implement my own `byte` => `String` mapping based on an array as Jon Skeet suggested. – Notinlist Mar 25 '15 at 12:32
  • @isnot2bad Of course it would be terrible if String.valueOf did that. What I was saying if he has predictable set of chars like only the first 256 chars he could call String.intern on those guys. This would save memory and GC if he is then storing that single letter string in a bunch of objects.. ie a million copies of the letter "a" vs only one but maybe interning doesn't make since for such small strings. This is not to save creation time but rather memory. – Adam Gent Mar 25 '15 at 19:31
  • @AdamGent He can easily reach the same by just using a `String[]`-cache. No need to internalize. But even without having some kind of flyweight-pattern, since Java 8, memory is saved due to a technique called 'string deduplication': http://stackoverflow.com/questions/27949213/string-deduplication-feature-of-java-8 – isnot2bad Mar 25 '15 at 22:11
2

The second one is certainly(in theory) slower, as it is translated into

v = Something.lookup(new StringBuilder().append("").append((char)binaryData[idx]).toString());

StringBuilders are implemented using a char[] initialized to hold 16 values. The StringBuilder option therefore initializes a char[] of size 16, only to copy the cells that are set (only the first one in this case) to the resulting string.

String.valueOf (which is equivalent to Character.toString) uses a char[] of size 1, and then directly sets the String's backing char[], thereby avoiding the need for a copy.

The first approach will not compile (at least not under java 7), as there is no String constructor accepting a single character as input: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html

EvenLisle
  • 4,672
  • 3
  • 24
  • 47
  • What I meant is that the second option does more work, and should in theory be slower. – EvenLisle Mar 25 '15 at 10:54
  • "*In theory there is no difference between theory and practice. In practice there is.*" :-) It happens that there is no noticeable difference between the three options with Hotspot JVM 8. – assylias Mar 25 '15 at 11:15
  • 1
    HotSpot JIT compiler recognizes the typical string concatenation pattern: `new StringBuilder().append().append()...toString()` and optimizes it very well. In fact, `"" + c` is **faster** than `Character.toString`. – apangin Mar 25 '15 at 11:33
  • @assylles There was in earlier versions though. The main point is that it is unlikely to matter either way. Not when the solution is incorrect in the first place. – biziclop Mar 25 '15 at 11:36
1

The first solution doesn't compile. The second solution internally creates a String calling a code similar to Character.valueOf(char). Third solution is better than fourth because internal implementation of Character.toString(char ch) is a call to String.valueOf

public static String toString(char c) {
    return String.valueOf(c);
}

Internal implementation of the third String.valueOf(char ch) is

public static String valueOf(char c) {
    char data[] = {c};
    return new String(0, 1, data);
}
Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56
1

I'm coming from C++ and I feel this lack of complexity informations disturbing. :-) Are my speculations correct? Do we have any kind of official documentation where performance guaranties and the caching mechanisms declared?

To answer this part of the question: in general there isn't. You will find information on the asymptotic performance of built-in collections and maybe a couple of other areas but by and large these issues were left to the VM implementations' discretion. You can of course always look at the source code but bear in mind that there are things that affect performance which you have no direct control over: JIT compiling and garbage collection are the two biggest.

Should you be disturbed by this? I don't think so, Java operates on the premise that low-level performance is rarely an issue the application developer needs to concern herself or himself with. It's a trade-off and you can argue whether it's a good trade-off or not but it is what it is.

But by the time you get to the point where you can develop really high performance systems, you'll have picked up all the necessary information along the way.

biziclop
  • 48,926
  • 12
  • 77
  • 104
  • I would add more upvotes if I could, because this is the only answer that deals with the second part of the question. – Notinlist Mar 25 '15 at 12:44
  • 1
    @Notinlist Just a small addendum: your question (or rather, the answers you got) demonstrates how this policy of giving very few implementation guarantees works in practice. `""+c` started out as a "bad habit", something that was very much frowned upon and it was indeed slow. But as of Java 8, this pattern is apparently recognised by the JIT compiler, which produces as fast a code for it as all the other solutions. If there were more guarantees on implementation detail, maybe this wouldn't have been possible. – biziclop Mar 25 '15 at 13:10