Why this simple Go program is slower than its Node.js counterpart?

Question

I'm attempting to use Go to implement a binary tree with values on the leaf, i.e., equivalent to:

data Tree a 
  = Node {left: Tree, right: Tree} 
  | Leaf {value: a}

I had two problems: 1, I couldn't figure a way to make a type with multiple constructors, so I had to fit all data in one. 2, I couldn't make it polymorphic, so I had to use interface{} (which I guess is an "opt-out" of the type-system?). This is the best I could make:

package main

import ("fmt")

type Tree struct {
  IsLeaf bool
  Left *Tree
  Value interface{}
  Right *Tree
}

func build(n int) *Tree {
  if (n == 0) {
    return &Tree{IsLeaf: true, Left: nil, Value: 1, Right: nil}
  } else {
    return &Tree{IsLeaf: false, Left: build(n - 1), Value: 0, Right: build(n - 1)}
  }
}

func sum(tree *Tree) int {
  if (tree.IsLeaf) {
    return tree.Value.(int)
  } else {
    return sum(tree.Left) + sum(tree.Right)
  }
}

func main() {
  fmt.Println(sum(build(23)))
}

It implements the type and tests it by summing over a huge generated tree. I've proceeded to make an equivalent implementation in JavaScript (including the redundant data on constructors, for fairness):

const build = n => {
  if (n === 0) {
    return {IsLeaf: true, Value: 1, Left: null, Right: null};
  } else {
    return {IsLeaf: false, Value: 0, Left: build(n - 1), Right: build(n - 1)};
  }
}

const sum = tree => {
  if (tree.IsLeaf) {
    return tree.Value;
  } else {
    return sum(tree.Left) + sum(tree.Right);
  }
}

console.log(sum(build(23)));

I've compiled the Go code with go build test.go and ran it with time ./test. I've ran the Node.js code with node test.js. After several tests, the Go program ran in about 2.5 seconds in average, versus 1.0 seconds of the Node.js one.

That makes Go 2.5x slower than Node.js for this simple program, which can't be correct, given that Go is a statically-typed, compiled language with a mature compiler, whereas JavaScript is an untyped, interpreted one.

Why is my Go program so slow? Am I missing some compiler flag, or is the code problematic?

What version of Go? Also what happens if you don't use an interface for a value.. You lose some speed by type casting Value to an int. — reticentroot, Jun 20 '17 at 02:02
@reticentroot Go 1.8. Tested with `int` instead of an `interface`, still about `2x` slower than Node.js. (But how do I make it polymorphic without it?) - Is there an equivalent to the `-O2` flag? — MaiaVictor, Jun 20 '17 at 02:02
I think the real issue is the implementation and the way Go handles recursion https://medium.com/@felipedutratine/iterative-vs-recursive-vs-tail-recursive-in-golang-c196ca5fd489 — reticentroot, Jun 20 '17 at 02:08
@reticentroot It is on the question, I'm using `go build file.go; time ./file`. — MaiaVictor, Jun 20 '17 at 02:08
Recursion is slow in go: See: https://medium.com/@felipedutratine/iterative-vs-recursive-vs-tail-recursive-in-golang-c196ca5fd489 — Jack, Jun 20 '17 at 02:10
You might want to take a look at this tree implementation which takes advantage of some of Go's features. You might see a speed gain by implementing it with channels https://golang.org/doc/play/tree.go — reticentroot, Jun 20 '17 at 02:14
I think the major problem with this code is it's javascript written in Go. — Timothy Jones, Jun 20 '17 at 05:07
@TimothyJones A simple recursion over a tree is now a JavaScript concept? I think you have things very reversed, the JS code here is adapted to fit Go's semantics. A more idiomatic JS implementation would be [3 lines of code](http://lpaste.net/356373) and is 5x faster than the Go one. Recursion and binary trees are absolutely universal concepts, I don't think that is up to debate. If a lang can't handle trees I have no hope for it. — MaiaVictor, Jun 20 '17 at 05:14
I think the interpretation of the object model is the problem here. I'm writing you up a more detailed answer. — Timothy Jones, Jun 20 '17 at 05:17
@TimothyJones fair enough, I'm curious to see what you got to show. — MaiaVictor, Jun 20 '17 at 05:18

Timothy Jones · Accepted Answer · 2017-06-20T07:25:55.303

Summary

This code is slower because of the type assertion, and reduntant data.

Go doesn't encourage you to write type assertions in hot places:

tree.Value.(int)

Take out this type assertion (and accordingly change Value to an int type), and your code will perform about twice as fast (which should be around the speed of your node example).

Take out the redundant data as well, and your code will perform about three times as fast. See the playground example at the end of the post.

Details

I think this is a mistake of design, rather than implementation. Reading your question, I think there is some confusion about how Go's type system works.

Go's object model doesn't encourage you to do polymorphism using catch-all types (see the top half of this excellent answer for a discussion of Go's polymorphism).

In a JavaScript world, each object is a specific type. In Go, a struct can be treated as a specific interface type if it fulfils the interface's contract. Note that structs are not objects - what you called constructors are just struct initialisers.

It is possible to write Go code that operates on interface{} as a placeholder for all types, but the language doesn't really encourage you to write code this way (as you pointed out in your question, it was a challenge to write clean code in the way you would write it in JavaScript).

Because Go doesn't really have objects, trying to write code that feels very object-oriented in Go will be challenging (additionally, Go doesn't have standard inheritance or method overloading). For this reason, I don't think that your code is the kind of code that Go encourages the programmer to write. So, it's not a fair test.

Type assertion is slow. (I'm not across the design of Go's internals, but certainly this indicates that the programmer is not expected to write a lot of type assertions). Because of this, it's not surprising that your code is not performant. I changed your code to:

type Tree struct {
  IsLeaf bool
  Left *Tree
  Value int
  Right *Tree
} 
 .....
func sum(tree *Tree) int {
  if (tree.IsLeaf) {
    return tree.Value
  } else {
    return sum(tree.Left) + sum(tree.Right)
  }
}

And achieved a 2x speed up on my machine.

There are probably other optimisations - you might be able to remove IsLeaf, and you don't need to store values at non-leaf nodes (or alternatively, you could distribute values throughout the tree, so never waste the Value). I don't know whether JavaScript optimises out these unnecessary Values, but I don't believe Go does.

So, I think your code is using much more memory than it needs, which won't help performance either.

Does it matter?

I'm not personally convinced by "I wrote this program in X and Y, and found that Y was slower", especially as it's hard to compare fairly across frameworks. There are so many other sources of variance - programmer knowledge, machine load, spin-up time, etc.

To do a fair test you'd need to write code that's idiomatic in each language, but also use the same code. I don't think it's realistic to achieve both.

If this code is your specific scenario, and performance is the primary goal, then this test might be helpful. But, otherwise I don't think it's a very meaningful comparison.

At scale, I would expect other considerations to beat how fast you can create and traverse a tree. There are technical concerns like data throughput and performance under load, but also softer concerns like programmer time, and maintenance effort.

The academic exercise is interesting, though. And writing code like this is a good way to find the edges of a framework.

Edit: I tried making your code more Go-like, which has the added advantage of a 3x speedup over the original.:

https://play.golang.org/p/mWaO3WR6pw

The tree is a bit heavy for the playground, but you can copy and paste the code to run locally.

There are more optimisations possible that I haven't tried, such as parallel construction of the tree.

You may be able to extend this design to have the polymorphic behaviour that you want (by providing alternative Leaf implementations), but I'm not sure what Sum() means for non-number types. Not knowing how to define Sum() is a good example of the kind of thinking that leads to not deciding to include polymorphism through generics.

For those new to Go (who may be upset by this different way of looking at polymophism), Go has type composition and a number of other powerful concepts, like the `interface` system - but these concepts encourage a different way of constructing programs. — Timothy Jones, Jun 20 '17 at 05:58
That is a great response, but leaves me without a sense of satisfaction. Even disabling polymorphism (which is *not* reasonable, IMO), it is still slower than Node.js, which is quite disappointing for a compiled language! I'm also quite disappointed that the solution to polymorphism is basically a wildcard that says "disable the typechecker at this location"... that's not powerful, that's just bad. Polymorphism is a solved problem since ML! (Note: I didn't really use any OOP concept on the code at all, it is just plain data and recursion; if anything, that's a functional style.) — MaiaVictor, Jun 20 '17 at 06:03
Go doesn't encourage you to disable the type checker- I think it's the reverse. It encourages you to have concrete implementations that are not generic. — Timothy Jones, Jun 20 '17 at 06:55
I'm not arguing for disabling polymorphism, I'm saying that Go doesn't encourage you to write code that is polymorphic *in the way you've written above*. There are other ways to write the same code, but generally, Go doesn't let you do generic typing. I initially found this frustrating, but after using it for a while, I no longer miss it. It's just a different style. — Timothy Jones, Jun 20 '17 at 06:58
See the edit for a more idiomatic Go tree, which causes a further speed improvement. — Timothy Jones, Jun 20 '17 at 07:12
How it is "just a different style"? I'm not talking about style, I'm talking about a practical problem. Suppose I've just implemented a container of some type (say, a List of ints). Then I want to reuse that container's methods (concat, reverse, etc.) for another type (say, string). How do I do that in Go? That's the problem polymorphism solves. You say there is a solution for that, but I see no solution yet. I've posted a [separate question](https://stackoverflow.com/questions/44646378/how-do-you-implement-a-container-in-go-without-losing-efficiency-and-type-safety) for that. — MaiaVictor, Jun 20 '17 at 07:25
Also, the [JS equivalent](http://lpaste.net/356373) of your last examples is still faster than Go by the same margin... — MaiaVictor, Jun 20 '17 at 07:31
Maybe it's just faster for this case. I don't know enough about node to know whether the test is fair. Either way, I don't think anyone is choosing node (or Go) for raw data structure performance. — Timothy Jones, Jun 20 '17 at 07:33
If performance is not one of the selling points, then in what sense Go is better than Node.js? As I see, it is just JavaScript with added verbosity, same-level (or less) performance and a "type-system" that is so weak it gives no real safety, yet makes expressing something as necessary as lists impossible (without disabling the type system). I with I did, but I don't get the point. (I don't think this is the place for this, though, so, sorry. I asked a question and you answered it well, so I guess that's it...) — MaiaVictor, Jun 20 '17 at 07:38
You are saying go has a weak type system... from someone doing javascript.... hahaha. — RickyA, Jun 20 '17 at 15:36
@RickyA TypeScript and PureScript both solve the lack of types on JavaScript and they're better than Go's type system in every imaginable aspect. — MaiaVictor, Jun 21 '17 at 21:03

reticentroot · Answer 2 · 2017-06-21T21:19:04.953

I thought that this might be beneficial. This is my implementation of a balanced binary tree, which uses recursion, go routines, and channels. It was meant to be used as a package, which is why i use Exported and Un-exported functions. The exported functions are what you should use/mod etc.. I wrote it a long time ago... there are plenty of things that could have been written better.. I added a Sum function just now however for you. I added 23 nodes and got the sum in 1/4 a second..

UPDATE I've added a new function called GetTreeTotal() if you look at the Tree struct i keep a Total field now. In the Add() function I update that field as the node is being added. Now sum() doesn't have to be calculated in mass, thats just part is the Tree's meta data now.. So in that sense. super fast. Using similar logic, number of nodes on the tree can be kept as meta data as well.. Knowing that information, would speed up functions like TreeToArray() because one could then define the size of the slice before hand. Less allocations.. etc

UPDATE2 This question got me curious, I rewrote the code below and turned it into a package. https://github.com/marcsantiago/GoTree Iterative inserts are almost 3x faster (benchmarks included), though you really see this difference when the amount of inserts is really high.

package main

import (
    "encoding/json"
    "errors"
    "fmt"
    "math/rand"
    "sync"
    "time"
)

type node struct {
    Left  *node
    Right *node
    Data  int
}

// Tree ...
type Tree struct {
    Root  *node
    Total int
}

// FindNode ...
func (t *Tree) FindNode(data int) bool {
    newNode := node{
        Data: data,
    }
    if t.Root != nil {
        if t.findNode(t.Root, newNode) != nil {
            return true
        }
    }
    return false
}

func (t *Tree) findNode(search *node, target node) *node {
    var returnNode *node
    if search == nil {
        return returnNode
    }
    if search.Data == target.Data {
        return search
    }
    returnNode = t.findNode(search.Left, target)
    if returnNode == nil {
        returnNode = t.findNode(search.Right, target)
    }
    return returnNode
}

// Add ...
func (t *Tree) Add(data int) {
    t.Total += data
    if data < 0 {
        panic(errors.New("Only submit positive integers"))
    }
    nodeToAdd := node{
        Data: data,
    }
    if t.Root == nil {
        t.Root = new(node)
    }
    if t.Root.Data == 0 {
        t.Root = &nodeToAdd
        return
    }

    t.add(t.Root, nodeToAdd)
    return
}

func (t *Tree) add(oldnode *node, newNode node) {
    if newNode.Data < oldnode.Data {
        if oldnode.Left == nil {
            // t.Total += newNode.Data
            oldnode.Left = &newNode
        } else {
            // t.Total += newNode.Data
            t.add(oldnode.Left, newNode)
        }
    } else if newNode.Data > oldnode.Data {
        if oldnode.Right == nil {
            // t.Total += newNode.Data
            oldnode.Right = &newNode
        } else {
            // t.Total += newNode.Data
            t.add(oldnode.Right, newNode)
        }
    }
    return
}

// InOrderTraversal ...
func (t *Tree) InOrderTraversal() {
    if t.Root != nil {
        currentNode := t.Root
        if currentNode.Left == nil && currentNode.Right == nil {
            fmt.Println(currentNode.Data)
        } else {
            t.inOrderTraversal(currentNode)
        }
    }
    return
}

func (t *Tree) inOrderTraversal(n *node) {
    if n.Left != nil {
        t.inOrderTraversal(n.Left)
    }
    fmt.Println(n.Data)
    if n.Right != nil {
        t.inOrderTraversal(n.Right)
    }
    return
}

// Traversal ...
func (t *Tree) Traversal() {
    if t.Root != nil {
        currentNode := t.Root
        if currentNode.Left == nil && currentNode.Right == nil {
            fmt.Println(currentNode.Data)
        } else {
            t.traversal(currentNode)
        }
    }
    return
}

func (t *Tree) traversal(n *node) {
    fmt.Println(n.Data)
    if n.Left != nil {
        t.traversal(n.Left)
    }

    if n.Right != nil {
        t.traversal(n.Right)
    }
    return
}

// Sum ...
func (t *Tree) Sum() (total int) {
    var wg sync.WaitGroup
    c := make(chan int, 100)
    if t.Root != nil {
        currentNode := t.Root
        if currentNode.Left == nil && currentNode.Right == nil {
            return 1
        }
        wg.Add(1)
        t.sum(currentNode, c, &wg)
    }
    go func() {
        wg.Wait()
        close(c)
    }()
    for n := range c {
        total += n
    }
    return total
}

func (t *Tree) sum(n *node, counter chan int, wg *sync.WaitGroup) {
    defer wg.Done()

    if n.Left != nil {
        wg.Add(1)
        go t.sum(n.Left, counter, wg)
    }

    counter <- n.Data

    if n.Right != nil {
        wg.Add(1)
        go t.sum(n.Right, counter, wg)
    }

    return
}

// CountEdges ...
func (t *Tree) CountEdges() (edges int) {
    c := make(chan int, 10)
    if t.Root != nil {
        currentNode := t.Root
        if currentNode.Left == nil && currentNode.Right == nil {
            return 1
        }
        t.countEdges(currentNode, c)
    }

    for {
        n := <-c
        if n == 0 {
            close(c)
            break
        }
        edges++
    }
    return edges
}

func (t *Tree) countEdges(n *node, counter chan int) {
    if n.Left != nil {
        go t.countEdges(n.Left, counter)
    }

    if n.Left == nil && n.Right == nil {
        counter <- 0
    } else {
        counter <- 1
    }

    if n.Right != nil {
        go t.countEdges(n.Right, counter)
    }
    return
}

// GenerateRandomTree ...
func (t *Tree) GenerateRandomTree() {
    u := time.Now()
    source := rand.NewSource(u.Unix())
    r := rand.New(source)
    arr := r.Perm(1000)
    for _, a := range arr {
        t.Add(a)
    }
    return
}

// GetRootData ...
func (t *Tree) GetRootData() int {
    return t.Root.Data
}

// GetTreeTotal ...
func (t *Tree) GetTreeTotal() int {
    return t.Total
}

// TreeToArray ...
func (t *Tree) TreeToArray() []int {
    ch := make(chan int, 10)
    arr := []int{}
    if t.Root != nil {
        currentNode := t.Root
        if currentNode.Left == nil && currentNode.Right == nil {
            return []int{currentNode.Data}
        }
        t.traversalGetVals(currentNode, ch)
    }

    for {
        n := <-ch
        if n == -1 {
            close(ch)
            break
        }
        arr = append(arr, n)
    }
    return arr
}

func (t *Tree) traversalGetVals(n *node, ch chan int) {
    if n.Left != nil {
        ch <- n.Left.Data
        go t.traversalGetVals(n.Left, ch)
    }

    if n.Right != nil {
        ch <- n.Right.Data
        go t.traversalGetVals(n.Right, ch)
    }
    if n.Left == nil && n.Right == nil {
        ch <- -1
    }
    return
}

// ShiftRoot ...
func (t *Tree) ShiftRoot(newRoot int) {
    arr := t.TreeToArray()
    n := Tree{}
    n.Add(newRoot)
    for _, i := range arr {
        n.Add(i)
    }
    *t = n
}

// PrintTree ...
func (t *Tree) PrintTree() {
    b, err := json.MarshalIndent(t, "", " ")
    if err != nil {
        panic(err)
    }
    fmt.Println(string(b))
}

func main() {
    // t := Tree{}
    // t.GenerateRandomTree()
    // t.PrintTree()
    // fmt.Println("total:", t.Sum())

    t := Tree{}
    t.Add(10)
    t.Add(100)
    t.Add(2)
    t.Add(3)

    fmt.Println(t.Sum()) // should be 115
    fmt.Println(t.GetTreeTotal())

    // t := Tree{}
    // for i := 1; i <= 23; i++ {
    //  t.Add(i)
    // }
    // fmt.Println("total:", t.Sum())

}

That's impressive! I didn't explore the realms of go routines yet. But notice there are 16777216 (i.e., `2^23`), not 23 leaf nodes on the tests I posted. — MaiaVictor, Jun 20 '17 at 03:08
I didn't realize that.. lol, in that case some more work would have to go into the sum function. For one i'm using a buffered channel of ten... might be better to bump that up... though I think it would be best to keep track of the total number as the number of added to the tree.. So in the Tree struct, add Total next to Root.... so that that you aren't calculating everything at once. — reticentroot, Jun 20 '17 at 03:14
@MaiaVictor I've updated the post.. now Sum doesn't have to be calculated. It can be queried for. So now the limiting factor is how fast nodes can be added. — reticentroot, Jun 20 '17 at 03:30

RickyA · Answer 3 · 2017-06-20T19:24:08.410

The problem is mainly in the fragmented memory allocation (via the recursive stack). This causes a lot of small allocations and subsequently the garbage collector has a hefty job. You can circumvent this by pre allocate an array that holds all nodes and keep a running index for assignment:

bar.go

package bar

type Tree struct {
    Left  *Tree
    Value int
    Right *Tree
    IsLeaf bool
}

func build(level int, curridx *int, src *[]Tree) *Tree {
    if level == 0 {
        (*src)[*curridx] = Tree{Left: nil, Value: 1, Right: nil, IsLeaf:true}
        *curridx++
        return &(*src)[*curridx-1]
    } else {
        (*src)[*curridx] = Tree{Left: build(level-1, curridx, src), Value: 1, Right: build(level-1, curridx, src)}
        *curridx++
        return &(*src)[*curridx-1]
    }
}

func sum(tree *Tree) int {
    if (tree.IsLeaf) {
        return tree.Value.(int)
    } else {
        return sum(tree.Left) + sum(tree.Right)
    }
}

bar_test.go

package bar

import "testing"
import "math"

func TestMe(t *testing.T) {
    for x := 0; x < 10; x++ {
        levels := 23
        nrnodes := int(math.Pow(2.0, float64(levels+1))) //there are actually 24 levels
        mapping := make([]Tree, nrnodes, nrnodes)
        index := 0
        t.Error(sum(build(levels, &index, &mapping)))
    }
}

This will speed things up to 0.5 sec per iteration.

Note the build in profiling of this:

go test -cpuprofile cpu.out and go tool pprof cpu.out + web

The `sum` function here is not correct. Notice the datatype, you're supposed to only sum the leaf nodes. The only reason there is a `Value` on non-leaf nodes to begin with is that I couldn't find how to express an **union type** in Go, so I had to use a single struct and, thus, include `Value int` in branch nodes. — MaiaVictor, Jun 20 '17 at 19:12

Why this simple Go program is slower than its Node.js counterpart?

3 Answers3

Summary

Details

Does it matter?