0

Given a reference to any String, is it possible to programmatically determine whether this is a reference to a compile time constant?
Or if it's not, then whether it's stored in the intern pool without doing s.intern() == s?

isConst("foo")                       -> true
isConst("foo" + "bar")               -> true   // 2 literals, 1 compile time string
isConst(SomeClass.SOME_CONST_STRING) -> true
isConst(readFromFile())              -> false
isConst(readFromFile().intern())     -> false  // true would be acceptable too

(context for comments below: the question originally asked about literals)

Bart van Heukelom
  • 43,244
  • 59
  • 186
  • 301
  • Why does `readFromFile().intern()` return `false` in your example? It can be the very same `"foo"` object referenced somewhere else as a String literal. – apangin Jul 10 '19 at 15:25
  • Seems like your question comes from a wrong assumption. Entries in a string pool are NOT created at class loading time. They are created lazily at the first resolution of `ldc` instruction. So, it's absolutely possible that `"foo"` literal will return the same object that has been created earlier by `readFromFile().intern()`. – apangin Jul 10 '19 at 16:20
  • A string literal is a syntactic construct of the Java programming language. There is no such thing as a string literal at runtime. Note that `5+6+""` is not a string literal, but still, `5+6+"" == "11"` evaluates to `true`, in fact, the expression will be replaced by the constant result `true` even at compile-time. In other words, “string literals”, “compile-time constants”, and “strings contained in the runtime pool” are related, but still entirely different things. A method like `isLiteral("foo")` is a contradiction in itself, as within the method, the parameter reference is never a literal. – Holger Jul 10 '19 at 17:26
  • @apangin "So, it's absolutely possible that "foo" literal will return the same object that has been created earlier by readFromFile().intern()" well in that case you may answer my question of "is it possible" with the answer "no it's not" :) – Bart van Heukelom Jul 11 '19 at 10:16
  • @Holger well I suppose compile-time constant is what I meant after all. To be clear, this is an academic question, however, a theoretical use case that made me ask this would be something like, forbidding a performance critical method to be given a newly allocated string every time. – Bart van Heukelom Jul 11 '19 at 10:19
  • Well, especially for academic questions, it is important to correctly distinguish between string literals, compile-time constants (of type `String`), and string instances contained in (referenced by) the runtime pool. – Holger Jul 11 '19 at 11:43
  • Theoretically it's possible to instrument your JVM with JVMTI agent which intercepts `isConst()` call, check the stackframe, finds the callsite, loads the bytecode and checks whether there was an LDC instruction just before the call. There are still some tricky cases: e.g. whether `isConst(flag ? "foo" : "bar")` should return true. I don't have an experience with JVMTI agents, but I believe, @apangin can do it. Though I'm not sure whether such solution is satisfactory for OP. – Tagir Valeev Jul 12 '19 at 04:03

1 Answers1

2

To clarify the original question, every string literal is a compile-time constant, but not every compile-time constant has to originate from a string literal.

At runtime, there is no difference between a String object that has been constructed for a compile-time constant or constructed by other means. Strings constructed for compile-time constants are automatically added to a pool, but other strings may be added to the same pool manually via intern(). Since strings are constructed and added lazily, it is even possible to construct and add a string manually, so that compile-time constants with the same value get resolved to that string later-on. This answer exploits this possibility, to detect when the String instance for a compile-time constant is actually resolved.

It’s possible to derive from that answer a method to simply detect whether a string is in the pool or not:

public static boolean isInPool(String s) {
    return s == new String(s.toCharArray()).intern();
}

new String(s.toCharArray()) constructs a string with the same contents, which is not in the pool and calling intern() on it must resolve to the same reference as s if s refers to an instance in the pool. Otherwise, intern() may resolve to another existing object or add our string or a newly constructed string and return a reference to it, depending on the implementation, but in either case, the returned reference will be different to s.

Note that this method has the side effect of adding a string to the pool if it wasn’t there before, which will stay there at least to the next garbage collection cycle, perhaps up to the next full gc, depending on the implementation.

The test method might be nice for debugging or satisfying curiosity, but there is no point in ever using it in production code. Application code should not depend on that property and the use case proposed in a comment, enforcing pooled strings in performance critical code, is not a good idea.

Besides the point that the test itself is expensive and counteracting the purpose of performance improvement, the underlying assumption that pooled strings are better than non-pooled is flawed. Not being in the pool doesn’t imply that the application will perform an expensive reconstruction every time it invokes the performance critical code. It may simply hold a reference in a variable or use a HashMap, both approaches way more efficient than calling intern(). In fact, even temporary strings can be the most efficient solution in some cases.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • awesome comment : *The test method might be nice for debugging or satisfying curiosity, but there is no point in ever using it in production code*. It's also interesting that if I search the usages of `String::intern`, I get 281 usages, some of them in the jdk itself; can't explain myself why, though. – Eugene Jul 11 '19 at 14:56
  • 1
    @Eugene most of them can be traced to XML related code from the Apache Foundation, which doesn’t surprise me. They really like to write `static final String SOME_CONSTANT = "some constant".intern();` to force compile-time constants to be no compile-time constant, so the field has to be read at runtime, don’t know which esoteric advantage they think it has. Then, Swing uses the anti-pattern of calling `intern()`, so they can use `==` for comparisons afterwards, doesn’t surprise me either. The remaining ~40 cases are related to Reflection and Serialization, where it is justified. – Holger Jul 11 '19 at 15:24