1

In this situation, I pass a pointer from Go to a C function, the C function modifies that pointer value (fills in an array) and I use the same pointer again on the Go code, making sure to call C.free to release it after I am done.

I am sometimes getting a nil reference on that pointer, I am having a hard time understanding why.

Here, the object in Go is cpuTimesC, in C it is the cputicks.

I have also tried making the function return the pointer, with the same results. The weirdest thing is that if I put a printf statement at the end of the function, it takes longer before I eventually get the nil error.

package collector

/*
#cgo LDFLAGS: -lperfstat
#include <stdlib.h>
#include <stdio.h>
#include <libperfstat.h>
#include <string.h>
#include <time.h>

u_longlong_t **ref;

int getCPUTicks(uint64_t **cputicks, size_t *cpu_ticks_len) {
    int i, ncpus, cputotal;
    perfstat_id_t firstcpu;
    perfstat_cpu_t *statp;

    cputotal = perfstat_cpu(NULL, NULL, sizeof(perfstat_cpu_t), 0);
    if (cputotal <= 0){
        return -1;
   }

    statp = calloc(cputotal, sizeof(perfstat_cpu_t));
    if(statp==NULL){
            return -1;
    }
    ncpus = perfstat_cpu(&firstcpu, statp, sizeof(perfstat_cpu_t), cputotal);
    *cpu_ticks_len = ncpus*4;

    *cputicks = (uint64_t *) malloc(sizeof(uint64_t)*(*cpu_ticks_len));
    for (i = 0; i < ncpus; i++) {
        int offset = 4 * i;
        (*cputicks)[offset] = statp[i].user;
        (*cputicks)[offset+1] = statp[i].sys;
        (*cputicks)[offset+2] = statp[i].wait;
        (*cputicks)[offset+3] = statp[i].idle;
    }
    return 0;
}
*/
import "C"

import (
    "errors"
    "unsafe"
    "fmt"
    "github.com/prometheus/client_golang/prometheus"
)

const ClocksPerSec = float64(C.CLK_TCK)
const maxCPUTimesLen = 1024 * 4

type statCollector struct {
    cpu *prometheus.Desc
}

func init() {
    registerCollector("cpu", true, NewCPUCollector)
}

func NewCPUCollector() (Collector, error) {
    return &statCollector{
        cpu: nodeCPUSecondsDesc,
    }, nil
}

func (c *statCollector) Update(ch chan<- prometheus.Metric) error {
    var fieldsCount = 4
    cpuFields := []string{"user", "sys", "wait", "idle"}

    var (
        cpuTimesC       *C.uint64_t
        cpuTimesLength  C.size_t
    )

    if C.getCPUTicks(&cpuTimesC, &cpuTimesLength) == -1 {
        return errors.New("could not retrieve CPU times")
    }
    defer C.free(unsafe.Pointer(cpuTimesC))
    cput := (*[maxCPUTimesLen]C.u_longlong_t)(unsafe.Pointer(cpuTimesC))[:cpuTimesLength:cpuTimesLength]

    cpuTicks := make([]float64, cpuTimesLength)
    for i, value := range cput {
        cpuTicks[i] = float64(value) / ClocksPerSec
    }

    for i, value := range cpuTicks {
        cpux := fmt.Sprintf("CPU %d", i/fieldsCount)
        ch <- prometheus.MustNewConstMetric(c.cpu, prometheus.CounterValue, value, cpux, cpuFields[i%fieldsCount])
    }

    return nil
}

The error is:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x33 addr=0x0 pc=0x1003fcec0]

goroutine 940 [running]:
github.com/dlopes7/aix-prometheus-exporter/collector.(*statCollector).Update(0xa000100000d21b8, 0xa0001000028c480, 0x0, 0x0)
        /home/david/go/src/github.com/dlopes7/aix-prometheus-exporter/collector/cpu_aix.go:81 +0xf0
github.com/dlopes7/aix-prometheus-exporter/collector.execute(0x10043e7e7, 0x3, 0x1101120e0, 0xa000100000d21b8, 0xa0001000028c480)
        /home/david/go/src/github.com/dlopes7/aix-prometheus-exporter/collector/collector.go:95 +0x6c
github.com/dlopes7/aix-prometheus-exporter/collector.AIXCollector.Collect.func1(0xa0001000028c480, 0xa000100000cd440, 0x10043e7e7, 0x3, 0x1101120e0, 0xa000100000d21b8)
        /home/david/go/src/github.com/dlopes7/aix-prometheus-exporter/collector/collector.go:115 +0x4c
created by github.com/dlopes7/aix-prometheus-exporter/collector.AIXCollector.Collect
        /home/david/go/src/github.com/dlopes7/aix-prometheus-exporter/collector/collector.go:114 +0xf8

And I know it happens here, because for some reason cpuTimesC is nil:

cput := (*[maxCPUTimesLen]C.u_longlong_t)(unsafe.Pointer(cpuTimesC))[:cpuTimesLength:cpuTimesLength]

Why would this object be nil sometimes, and how do I make it remain in memory until I call that C.free?

This is on AIX PPC64 with cgo if that makes any difference.

kostix
  • 51,517
  • 14
  • 93
  • 176
David Lopes
  • 46
  • 1
  • 5
  • https://stackoverflow.com/questions/4007268/what-exactly-is-meant-by-de-referencing-a-null-pointer – Gor Stepanyan Dec 08 '19 at 14:46
  • This does not help at all. I am passing a pointer as an argument, the pointer should not be dereferenced because I am calling malloc, it must be freed by calling free, can you expand on why you posted that link? – David Lopes Dec 08 '19 at 14:49
  • The code is heavily inspired on this: https://github.com/prometheus/node_exporter/blob/master/collector/cpu_dragonfly.go and they seem to not get a nil pointer. – David Lopes Dec 08 '19 at 14:57
  • 2
    Is `ncpus` ever zero? (`malloc(0)` is allowed to return `NULL`. AIX is notorious for this.) – torek Dec 08 '19 at 15:03
  • I think you got it, it looks like ncpus is -1 "If I use the return value too fast", if I put a printf after perfstat_cpu I always get the correct value (16 in this case). I wonder if I need to somehow wait for this function or something like that. – David Lopes Dec 08 '19 at 15:18
  • 1
    Compared to [this example](https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/performancetools/idprftools_perfstat_int_cpu.html) your C code is missing copying of the name of the first CPU to the `name` field of the `firstcpu` variable. May it really be the reason for the next call to `perfstat_cpu` to fail (sometimes)? – kostix Dec 08 '19 at 16:48
  • Yeu could try and link with Electric Fence -- it might help to find memory-abuses. http://lzsiga.users.sourceforge.net/aix-linking.html#Q0040 – Lorinczy Zsigmond Dec 08 '19 at 17:13
  • 1
    Thanks @kostik that seems to fix it, I believe I was passing some garbage to that function before copying the name – David Lopes Dec 08 '19 at 23:49

1 Answers1

0

This had nothing to do with pointers being removed from memory, the cputicks was NULL because ncpus was -1 so the malloc call returned NULL, ncpu was -1 because the first parameter of the perfstat_cpu call was not initialized properly, adding:

strcpy(firstcpu.name, FIRST_CPU);

Before the call to perfstat_cpu fixed the issue, as suggested by @kostik

David Lopes
  • 46
  • 1
  • 5