C to Go: Is this long execution time caused by CGO overhead?

Question

I am trying to write an SDK that lets me build custom Alteryx tools in Go. Alteryx provides a C API for these custom tools. I have been able to use cgo to connect Golang code to the Alteryx engine. However, when I compare with a similar Python tool built with Alteryx's own Python SDK, the Go tool is noticeably slower. My benchmark Python tool executes 10 million records of data in ~1 minute, whereas the Go tool takes about 1 minute 20 seconds. I was hoping to see execution speeds much faster than Python. And when I execute my code in unit tests with everything except the cgo calls, it finishes in 20-30 seconds.

The basic flow of control is something like this:

An upstream tool pushes a record of data (a pointer to a blob of bytes) to my tool by calling a cgo function (iiPushRecord). So, C is calling Go here. My cgo function looks like this:

//export iiPushRecord
func iiPushRecord(handle unsafe.Pointer, record unsafe.Pointer) C.long {
    incomingInterface := pointer.Restore(handle).(IncomingInterface)
    if incomingInterface.PushRecord(record) {
        return C.long(1)
    }
    return C.long(0)
}

My tool does something with the data (in this basic benchmark, it just copies the incoming record into some outgoing buffers).
My tool pushes the record in the outgoing buffers to other tools by calling those tools' own PushRecord method. Go is calling C here. The cgo function looks like this:

func PushRecord(connection *ConnectionInterfaceStruct, record unsafe.Pointer) error {
    result := C.callPushRecord(connection.connection, record)
    if result == C.long(0) {
        return fmt.Errorf(`error calling pII_PushRecord`)
    }
    return nil
}

and the C function it calls (callPushRecord) looks like this:

long callPushRecord(struct IncomingConnectionInterface * connection, void * record) {
    return connection->pII_PushRecord(connection->handle, record);
}

I ran a profile while the code was being executed in Alteryx, with the following results. The first point where the diagram branches into 3 parts is where my question lies. All of my code is down the left branch, which is about 40% (or about 32 seconds) of total execution time. The other 2 branches (60%, or about 48 seconds) seem to be related to cgo overhead; in particular, the cost of calling Go from C. Did I interpret the profile correctly and is this overhead something I can fix (i.e. have I messed up my cgo code)? Or is this something inherent to the Go runtime, as my research seems to be suggesting, and I will not be able to optimize it?

If you want to dig further, all of the code is available on GitHub, and the main part is in the api folder.

Tricky question and as nobody else has answered I will say that in my (limited) experience CGO overhead is small. You should be able to get performance at least as good as Python, but I suspect no better as the Alteryx Python lib is probably optimised and/or mostly in C. — AJR, May 25 '20 at 03:59
Thanks @AJR, I was expecting overhead small enough that I wouldn't notice. And I think that is true for the Go->C calls my code is making (which you can see at the bottom of the profile graph). But it seems like the C->Go calls are a lot more expensive. Or at least, I don't know how else to explain the first branching of my performance profile, where 60% of CPU time is spent on runtime functions and never touches my code. — Tom Larsen, May 25 '20 at 12:21

C to Go: Is this long execution time caused by CGO overhead?

0 Answers0