5

I want a representation for strings with fast concatenation and editing operations. I have read the paper "Ropes: an Alternative to Strings", but have there been any significant improvements in this area since 1995?

EDIT: One possibility I've considered before is using a 2-3 finger tree with strings as leaves, but I have not done a detailed analysis of this; this gives amortized constant-time addition/deletion on ends and logarithmic (in the number of chunks of the smaller string) concatenation, as opposed to vice versa for ropes.

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487
  • 1
    I came over this topic for a few seconds from http://wiki.sharpdevelop.net/AvalonEdit.ashx, and want to know exactly the same thing :-) Let's see... – jdehaan Jun 14 '10 at 18:30
  • What sort of improvements are you hoping to find? – Justin Peel Jun 14 '10 at 18:51
  • Faster asymptotics, or constant factors, or less memory use. – Alexey Romanov Jun 14 '10 at 18:53
  • 1
    Might not be relevant to the answer you seek, but a few questions. How do you plan to use these ropes? What do you exactly mean by 'edit'? Would you iterate over the strings? Would you access a character by index (like an array)? How frequent are those operations? Is there any specific ordering of the operations? (like concats before edits etc). Or are you just talking in general terms and are just curious? –  Jun 15 '10 at 01:22
  • "What do you exactly mean by 'edit'?" Good question, for which I don't have a general answer. "Would you iterate over the strings?" Yes. "Would you access a character by index (like an array)?" No. "Is there any specific ordering of the operations? (like concats before edits etc)." For a general use string representation, no. But there is a use case in which concats _mostly_ come before anything else. – Alexey Romanov Jun 19 '10 at 13:07

1 Answers1

1

This is an old question! I wonder if anyone reads this. But still it's intrigueing. In your comments, you say you look for:

Faster asymptotics, or constant factors, or less memory use

Well, ropes have O(1) insertion, and O(n) iteration. You can't do better than that. Substrings and indexing is obviously going to be more costly. But most use cases for large documents don't require editing or random access. If you only concatenate at the end, a 1D vector/list of strings could improve the insertion time constant. I used to use this in JavaScript because it had such slow string concatentation.

It is said that memory representation is less efficient than using strings. I doubt that: If you work in a language that has garbage collection, the rope allows you to use the same string fragment instance in multiple places. In a rope that represents a HTML document, there will be many DIV's, SPAN's and LINK elements. This might even happen automatically assuming these tags are compile time constants, and you add them to the rope directly. Even for such short phrases, the rope document will reduce in size significantly, to the same order of magnitude as the original string. Longer strings will produce a net gain.

If you also make the tree elemenst read only, you can create subropes (longer phrases expressed as ropes), that occur multiple times or are shared across rope based strings. The downside of this sharing is that such shard rope sections can't be changed: to edit them, or to balance the tree you need to copy the object graph. But that does not matter if you mostly concatenate and iterate. In a web server, you can keep a subrope that repesents the CSS stylesheet declaration that is shared across all HTML documents served by that server.

  • Well, I am reading :) "You can't do better than that." But I can do, e.g. O(1) concatenation (and still O(n) iteration). I am, of course, aware that persistent data structures allow sharing. – Alexey Romanov Feb 04 '11 at 21:28