3

In a legacy code base I am dealing with, there are vast number of String literals. A large number of them are duplicates. For example, the string "userID" is used say in 500 places. There are maybe a thousand such literals which are used in a repeated manner. IntelliJ Idea static code analysis suggests that I extract those as constants. If the IDE does this refactoring automatically for me, without me typing a single line of code, should I go for it?

Is it a good idea, in general, to extract many such duplicate string literals as constants? This will obviously avoid duplication, will provide single point of access, declaration, etc.

However, some of these literals come into picture when accessed. If I declare all literals as constants (static final), then all those will be loaded together. In that context, is it a good idea to declare all those literals as constants? Could you provide some pointers to garbage collection, memory space precautions in such scenarios? What are best practices used in such scenario?

Some notes: I know that string literals are interned. So I don't thing I would be saving any memory in the worst case. Also it seems that jdk 7 will put those strings in heap rather that permgen. I saw a couple of questions like mine, but feel that it is different. So posting it here.

Thanks

Community
  • 1
  • 1
Atul
  • 2,673
  • 2
  • 28
  • 34
  • 5
    Probably. This is my *opinion* (hint hint), but at the bare minimum you'll reduce the amount of mistakes that you'll run in to if you need to use a particular string. It also gives you the opportunity to add context to what this string actually means, too. – Makoto Feb 27 '15 at 07:00

1 Answers1

6
  1. All String Literals are interned automatically. From JDK7+, they will be GCed when the class (actually the classloader which loaded the class which defines the string literal) which defines them gets GCed (provided no other class refers to it (though this happens rarely..). Making them static and final and putting them into a common class is indeed useless from a Memory saving perspective but useful from a design perspective because it will provide a single point of access.

  2. The same String literals are shared across all classes in the JVM. So, there will be no new Strings. Effectively, putting them into one class and accessing them from that place makes your code more structured/ more readable.

My suggestion, don't tinker with legacy code unless it makes a lot of difference. The trade-offs are yours' to choose. :P

TheLostMind
  • 35,966
  • 12
  • 68
  • 104
  • 1
    agree with the `don't tinker with legacy code...` comment – Scary Wombat Feb 27 '15 at 07:07
  • 3
    Exactly; @Atul The real reason to declare strings as constants is to reduce the chance of coding errors due to typos - doing this essentially lets the compiler enforce that you're using the same string everywhere you're supposed to be; also letting you change strings easily in once place and give strings names that have more semantic meaning (also lets you give context to strings with the same value but different purposes). There's no (or minimal) difference as far as memory usage and such goes due to interning, though. If the legacy code works, there's no reason to make this type of change. – Jason C Feb 27 '15 at 07:08
  • @ScaryWombat - I would never tinker with legacy code that works :P – TheLostMind Feb 27 '15 at 07:10
  • @JasonC - *If the legacy code works, there's no reason to make this type of change* :P – TheLostMind Feb 27 '15 at 07:11
  • 1
    Agree with legacy code part. But if the IDE does automatic refactoring of this extract constant in a single go, would it still be a bad idea? – Atul Feb 27 '15 at 07:11
  • 4
    @Atul Most likely. Making that change now, and doing it *right*, could be far more tedious and complicated than a simple find+replace. For example, you'd want to verify that every occurrence of "userId" actually means the same thing. If one is e.g. a database column name and another is an unrelated but coincidentally identical, say, key in a configuration file, you *potentially* make things worse by accidentally combining them into the same constant when they are semantically different. You'd have to analyze all of that. – Jason C Feb 27 '15 at 07:12
  • 1
    @Atul - To add to Jason's comment, there could be *multiple* `"userid"` strings that could be *unrelated*. i.e, tomorrow a customer might ask you to change it to `"username"` then there will be a problem of actually searching and replacing the *correct* userid strings with username.. – TheLostMind Feb 27 '15 at 07:26