1

Using strings as String objects is pretty convenient for many string processing tasks.

I need extract some substrings to process and scala String class provide me with such functionality. But it is rather expensive: new String object is created every time substring function is used. Using tuples (string : String, start : Int, stop : Int) solves the performance problem, but makes code much complicated.

Is there any library for creating string proxys, that stores original string, range bound and is compatibles with other string functions?

ayvango
  • 5,867
  • 3
  • 34
  • 73
  • 3
    Do you have benchmarks that show strings being slower than tuples, or are you just guessing? – Stuart Cook Sep 16 '11 at 11:18
  • just guessing. Tuples means other ways for proccessing: char by char iteration. It would be faster than copying a string to proccess it. But convenient functions such as `startsWith` should be implemented by hand – ayvango Sep 16 '11 at 11:23
  • You say `a new String object is created every time substring function is used`. What makes you say that? Because in general (Sun's Hotspot VM for example) this isn't the case. – Andrzej Doyle Sep 16 '11 at 11:31
  • 2
    Strictly speaking, a new string object *is* created every time the substring method is called. However, the substring and the original string share the same character array. – Stuart Cook Sep 16 '11 at 11:35

2 Answers2

10

Java 7u6 and later now implement #substring as a copy, not a view, making this answer obsolete.


If you're running your Scala program on the Sun/Oracle JVM, you shouldn't need to perform this optimization, because java.lang.String already does it for you.

A string is stored as a reference to a char array, together with an offset and a length. Substrings share the same underlying array, but with a different offset and/or length.

Stuart Cook
  • 3,994
  • 25
  • 23
  • 2
    We actually had this behaviour bite us not long ago. One of our apps was using more memory than we expected. Turns out we had a collection of short Strings, which had been obtained from the substring of an extremely large String, and it was holding on to the huge byte array from the original String. `new String(str)` fixed that easily, but it was an interesting thing to trip up on. – Nick Sep 16 '11 at 11:38
  • 2
    It's worth reading the explanation of [why .NET doesn't perform the same substring optimization](http://stackoverflow.com/questions/6742923/if-strings-are-immutable-in-net-then-why-does-substring-take-on-time). – Stuart Cook Sep 16 '11 at 11:47
5

Look at the implementation of String (in particular substring(int beginIndex, int endIndex)): it's already represented as you wish.

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487