5

According to this Go Data Structures article, under the Strings section it states that taking a slice of a string will keep the original string in memory.

"(As an aside, there is a well-known gotcha in Java and other languages that when you slice a string to save a small piece, the reference to the original keeps the entire original string in memory even though only a small amount is still needed. Go has this gotcha too. The alternative, which we tried and rejected, is to make string slicing so expensive—an allocation and a copy—that most programs avoid it.)"

So if we have a very long string:

s := "Some very long string..."

And we take a small slice:

newS := s[5:9]

The original s will not be released until we also release newS. Considering this, what is the proper approach to take if we need to keep newS long term, but release s for garbage collection?

I thought maybe this:

newS := string([]byte(s[5:9]))

But I wasn't certain if that would actually work, or if there's a better way.

the system
  • 9,244
  • 40
  • 46
  • 2
    It should be noted that Java abandoned this optimization as the gain was too low compared to the added bugs in user code. See this [related question](http://stackoverflow.com/questions/13803505/why-string-class-has-copy-constructor). In Go the problem is different as array slicing has a very specific semantic that should be learned as a basis of the language. – Denys Séguret Jan 18 '13 at 09:32

2 Answers2

5

Yes, converting to a slice of bytes will create a copy of the string, so the original one is not referenced anymore, and can be GCed somewhere down the line.

As a "proof" of this (well, it proves that the slice of bytes doesn't share the same underlying data as the original string):

http://play.golang.org/p/pwGrlETibj

Edit: and proof that the slice of bytes only has the necessary length and capacity (in other words, it doesn't have a capacity equal to that of the original string):

http://play.golang.org/p/3pwZtCgtWv

Edit2: And you can clearly see what happens with the memory profiling. In reuseString(), the memory used is very stable. In copyString(), it grows fast, showing the copies of the string done by the []byte conversion.

http://play.golang.org/p/kDRjePCkXq

mna
  • 22,989
  • 6
  • 46
  • 49
  • Could it make a difference that in your examples, the new `[]byte` is being assigned to a variable instead of being inline like mine? – the system Jan 17 '13 at 21:00
  • No, see the second edit in my answer (proof via memory stats). – mna Jan 17 '13 at 21:12
  • And the previous Go question (and answer) is about this behaviour, so you may want to take a look at http://stackoverflow.com/a/14373714/1094941 – mna Jan 17 '13 at 21:14
3

The proper way to ensure a string might eventually get eligible for garbage collection after slicing it and keeping the slice "live", is to create a copy of the slice and keeping "live" the copy instead. But now one is buying better memory performance at the cost of worsened time performance. Might be good somewhere, but might be evil elsewhere. Sometimes only proper measurements, not guessing, will tell where the real gain is.

I'm, for example, using StrPack, when I prefer a bit of evilness ;-)

zzzz
  • 87,403
  • 16
  • 175
  • 139
  • Thanks. Yeah, what I had in mind was a situation where a very small slice would be taken from a large string that isn't otherwise needed. Mostly I wanted to know the proper technique. I see the `StrPack` uses the same thing. Thanks again. – the system Jan 17 '13 at 21:51
  • 1
    You mean `newS := StrPack(s[5:9])`? ;-) – zzzz Jan 18 '13 at 09:10