4

After reading answers on this old question, I'm a bit curious to know if there are any frameworks now, that provide for storing large no.(millions) of small size(15-25 chars long) Strings more efficiently than java.lang.String.

If possible I would like to store represent the string using byte[] instead of char[].

My String(s) are going to be constants & I don't really require numerous utility methods as provided by java.lang.String class.

Community
  • 1
  • 1
Rajat Gupta
  • 25,853
  • 63
  • 179
  • 294

2 Answers2

3

Java 6 does this with -XX:+UseCompressedStrings which is on by default in some updates.

Its not in Java 5.0 or 7. It is still listed as on by default, but its not actually supported in Java 7. :P

Depending on what you want to do you could write your own classes, but if you only have a few 100 MBs of Strings I suspect its not worth it.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • any reason why it has not been supported by java 7 ? – Rajat Gupta Aug 08 '12 at 12:17
  • I have asked a couple of the Oracle developers in person but they said it was something about being too hard to support and/or not a priority. :( If you can use Java 6 that will do it for you. – Peter Lawrey Aug 08 '12 at 12:18
  • I still use java 6 but as per `java.lang.String` class I see there are char[] & 3 bookkeeping fields used for representation which bloat the size too much than actual data. – Rajat Gupta Aug 08 '12 at 12:23
  • The header alone for each instance can be 8-16 bytes long. In that case your only option is to use `byte[]` which isn't very friendly. – Peter Lawrey Aug 08 '12 at 12:25
  • 2
    If you don't need to keep references to every instance and can use a number instead, you could build a long byte[] and a index for the end for each String. Something I did in this class https://github.com/peter-lawrey/Java-Chronicle/blob/master/src/main/java/vanilla/java/chronicle/impl/IntIndexedChronicle.java This uses off heap memory to write data in an efficient manner (just 4 bytes per entry overhead) This link has more detail as to the thinking behind the library https://github.com/peter-lawrey/Java-Chronicle – Peter Lawrey Aug 08 '12 at 12:29
0

Most likely this optimization is not worth the effort and complexity it brings with it. Either live with what the VM offers you (as Peter Lawrey suggests), or go through great lengths to work your own solution (not using java.lang.String).

There is an interface CharSequence your own String class could implement. Unfortunately very few JRE methods accept a CharSequence, so be prepared that toString() will need to be used frequently on your class if you need to pass any of your 'Strings' to any other API.

You could also hack String to create your Strings in a more memory efficient (and less GC friendly way). String has a (package access level) constructor String(offset, count, char[]) that does not copy the chars but just takes the char[] as direct reference. You could put all your strings into one big char[] array and construct the strings using reflection, this would avoid much of the overhead normally introduced by the char[] array in a string. I can't really recommend this method, since it relies on JRE private functionality.

Durandal
  • 19,919
  • 4
  • 36
  • 70