2

I have an application that uses a ton of String objects. One of my objects (lets call it Person) contains 9 of them. The data that is written to each String object is never written more than once, but will be read several times after. There will be several hundred thousand or so Person objects at a given time and many of these Person objects will share first name, last name, etc...

I am trying to think of immediate ways to reduce the amount memory that is consumed by the Person object but I am no expert when it comes to how Java manages its memory underneath.

Before I go down this rabbit hole, I would like to know what drawbacks there would be if I went down these paths and if it even make sense in the first place:

  • Using StringBuilder or StringBuffer solely because of the trimToSize() method which would allow me to reduce the number of allocated bytes used in the string.
  • Store the strings as byte[] array's and provide a getter that would convert the byte[] to String and a setter that would accept String and convert to byte[] - data is being read quite a bit, so would this be too expensive?
  • Create a hash table for (lets just say) "names" that would prevent duplicate allocations (using a pointer) for the same name over and over (there could be thousands of names with 10+ characters).

Before I pointlessly head down any of these roads, does it make sense to do? Maybe Java is already reducing String allocations and checking for duplicates?

I don't mind a good read either. I have found some documentation but nothing that explores to this depth.

user0000001
  • 2,092
  • 2
  • 20
  • 48
  • Option 3 would be used in conjunction with either option 1 or option 2? Possibly? – user0000001 Jun 07 '18 at 18:21
  • You should probably not look at optimizing strings. Java does that fairly well and if you have problems, think about the broader design of your application. – zapl Jun 07 '18 at 18:28
  • You can use `String.intern` to add the strings to the string pool, and use an already seen value if possible; but I really doubt this is the correct approach. Simply having lots of available memory would be the first approach I'd try. – Andy Turner Jun 07 '18 at 18:28
  • 2
    it could help https://shipilev.net/jvm-anatomy-park/10-string-intern/ – Sergey Morozov Jun 07 '18 at 18:32
  • and this one https://shipilev.net/#string-catechism – Sergey Morozov Jun 07 '18 at 18:32
  • @zapl Optimization is extremely important even if your design allows for sloppy allocations. – user0000001 Jun 07 '18 at 18:33
  • @SergeyMorozov Thank you for this. I will read. Cheers. – user0000001 Jun 07 '18 at 18:33
  • @SergeyMorozov very interesting. – Andy Turner Jun 07 '18 at 18:38
  • @Andy Turner when you will have read it you will never use String.intern() :-) – Sergey Morozov Jun 07 '18 at 18:43
  • Rather than do things with String, why not try and reduce number of Person objects in memory. Consider using an embedded db or similar (e.g. h2 db) and then you can store your Person on disk when not being accessed. – Hitobat Jun 07 '18 at 19:31
  • @Hitobat I'm already doing that. Looking to further optimize is all. – user0000001 Jun 07 '18 at 19:42
  • @user0000001 yes optimization is important but the 80/20 principle applies. Looking at strings without solid evidence based on profiling doesn't sound like a good idea to invest time to me. – zapl Jun 11 '18 at 07:49

1 Answers1

2
  1. Obviously StringBuilder and StringBuffer couldn't help in this case. String is immutable object, so these 2 classes were introduced for building Strings not for storing. Anyway you may (in most cases - must) use StringBuilder if you concatinate/insert chars in the middle/delete some chars from/of Strings

  2. In my opinion, second option could led to increasing memory consuption because new String will be created when byte[] will be converted to String every time you need it.

  3. Handwritten StringDeduplicator is very reasonable solution, especially if you are stuck with java 5,6,7.

  4. Java 8/9 has String Deduplication option. By default, this option is disabled. To use this one in Java 8, you must enable the G1 garbage collector, while in Java 9 G1 is the default.

    -XX:+UseStringDeduplication

Regarding String Deduplication, see:

Sergey Morozov
  • 4,528
  • 3
  • 25
  • 39
  • Good Answer, except your # 3 seems to be a contradiction of your # 4. The linked JEP 192 explaining the String Deduplication feature seems to cover all the bases needed by the OP, including the Hashtable, background threading, and support for string interning. Seems unwise to re-invent all that by writing your own String Deduplication feature. – Basil Bourque Jun 07 '18 at 22:40
  • 1
    By the way, this talk may be of interest to readers: [*java.lang.String Catechism*](https://www.youtube.com/watch?v=YgGAUGC9ksk) by [Aleksey Shipilev](https://shipilev.net). Discussion of interning & deduplication & G1 [starts at 29:00](https://youtu.be/YgGAUGC9ksk?t=28m58s). – Basil Bourque Jun 07 '18 at 23:09