3

I am trying to optimize my stringpad library in Go. So far the only way I have found to fill a string (actually bytes.Buffer) with a known character value (ex. 0 or " ") is with a for loop.

the snippet of code is:

// PadLeft pads string on left side with p, c times
func PadLeft(s string, p string, c int) string {
    var t bytes.Buffer
    if c <= 0 {
        return s
    }
    if len(p) < 1 {
        return s
    }
    for i := 0; i < c; i++ {
        t.WriteString(p)
    }
    t.WriteString(s)
    return t.String()
}

The larger the string pad I believe there is more memory copies of the t buffer. Is there a more elegant way to make a known size buffer with a known value on initialization?

icza
  • 389,944
  • 63
  • 907
  • 827
Peter Preeper
  • 41
  • 1
  • 6

1 Answers1

4

You can only use make() and new() to allocate buffers (byte slices or arrays) that are zeroed. You may use composite literals to obtain slices or arrays that initially contain non-zero values, but you can't describe the initial values dynamically (indices must be constants).

Take inspiration from the similar but very efficient strings.Repeat() function. It repeats the given string with given count:

func Repeat(s string, count int) string {
    // Since we cannot return an error on overflow,
    // we should panic if the repeat will generate
    // an overflow.
    // See Issue golang.org/issue/16237
    if count < 0 {
        panic("strings: negative Repeat count")
    } else if count > 0 && len(s)*count/count != len(s) {
        panic("strings: Repeat count causes overflow")
    }

    b := make([]byte, len(s)*count)
    bp := copy(b, s)
    for bp < len(b) {
        copy(b[bp:], b[:bp])
        bp *= 2
    }
    return string(b)
}

strings.Repeat() does a single allocation to obtain a working buffer (which will be a byte slice []byte), and uses the builtin copy() function to copy the repeatable string. One thing noteworthy is that it uses the working copy and attempts to copy the whole of it incrementally, meaning e.g. if the string has already been copied 4 times, copying this buffer will make it 8 times, etc. This will minimize the calls to copy(). Also the solution takes advantage of that copy() can copy bytes from a string without having to convert it to a byte slice.

What we want is something similar, but we want the result to be prepended to a string.

We can account for that, simply allocating a buffer that is used inside Repeat() plus the length of the string we're left-padding.

The result (without checking the count param):

func PadLeft(s, p string, count int) string {
    ret := make([]byte, len(p)*count+len(s))

    b := ret[:len(p)*count]
    bp := copy(b, p)
    for bp < len(b) {
        copy(b[bp:], b[:bp])
        bp *= 2
    }
    copy(ret[len(b):], s)
    return string(ret)
}

Testing it:

fmt.Println(PadLeft("aa", "x", 1))
fmt.Println(PadLeft("aa", "x", 2))
fmt.Println(PadLeft("abc", "xy", 3))

Output (try it on the Go Playground):

xaa
xxaa
xyxyxyabc

See similar / related question: Is there analog of memset in go?

icza
  • 389,944
  • 63
  • 907
  • 827
  • Thank you. That was the elegance I was looking for, I knew it was out there but I just was not seeing it. This greatly improved the performance of the code by an order of magnitude. When padding 200 characters in my benchmarks it went from 3785 ns/op to 220 ns/op. When padding 4 characters it went from 196 ns/op to 73.6 ns/op. This is very helpful to improve the performance of my library. – Peter Preeper Sep 01 '18 at 23:30