2

To try something with performance improvement, I am planning to use a string containing large number of characters. I want to know if there is any size which can be considered optimal. For ex if I declare a string array which can hold 5000 strings and if string in each of the cell holds 5000-7000 characters, I am thinking whether there is any kind of performance degrading.

Please advice.

user1588737
  • 280
  • 2
  • 6
  • 12
  • 4
    I'm sure the rest of your code is more likely to be a problem. This is usually the kind of micro-optimization that means little. – duffymo Aug 09 '12 at 21:57
  • ¿Performance improvement on what? Did you profile it to know that the array solves something? – Alfabravo Aug 09 '12 at 21:58

2 Answers2

3

String literals are constrained to 65535 bytes due to being stored in the constant pool. I'm not sure if there is a limit on run time strings, apart from the obvious limit of 2^31-1 due to array addressing.

Edit to clear things up: This is 65535 bytes in Modified Utf8 encoding. It's the same as normal Utf8 except that the null character is two bytes and characters outside the BMP use a surrogate pair (6 bytes instead of 4). If you're just doing ascii, then this is just one byte per character.

Antimony
  • 37,781
  • 10
  • 100
  • 107
  • Do you have a reference for the maximum length of a string literal? I found a hint, that the constant pool itself can have up to 65535 entries but wasn't able to find a limit for the entry length itself. 65535 *bytes* is equivalent to 32k characters, btw. – Andreas Dolk Aug 09 '12 at 22:11
  • 2
    Dunno, but if you think you need a string literal more than 65K characters, that's probably God's way of telling you to implement things in a different way. – Neil Coffey Aug 09 '12 at 22:13
  • @Anderas, actually it's 65k bytes in modified utf8 encoding. Most characters only take 1 byte. As for a reference, check section 4.4.7 of the JVM spec. The length field is only two bytes. – Antimony Aug 09 '12 at 22:19
3

In principle, as Antimony has mentioned, the limit on Strings is the number of characters you can fit in an array, i.e. 2^31-1.

The amount of data that you mention is roughly in the order of 100MB: i.e. not a huge problem if you really are sticking within that order of data. If you were thinking of using 10+ times as much, you might need to start re-thinking things.

One thing you could think about doing is to try and declare your code to pass around CharSequences rather than Strings. You can't override String, but you can create your own class that implements CharSequence if you realise later down the line that doing so can buy you some optimisation (e.g. compressing the internal representation in some way).

Apart from that, I would write the code the way you intend and then profile if you actually hit upon a performance problem in practice.

Neil Coffey
  • 21,615
  • 7
  • 62
  • 83