0

I'm testing cgo and every simple hello world like code works well.
but i have a problem with C code below.
The C code is that traverse a directory tree and sums file size.
if i build with go command, then the build is OK with no error.
but when running, there is a "segmentation violation" error occurred

bash$./walkdir 
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x7f631e077c1a]
. . . .

-------------------------------------------------------------

package main
/*
#include <stdint.h>
#include <fts.h>
#include <sys/stat.h>

uintmax_t get_total_size(char *path)
{
    uintmax_t total_size = 0;
    FTS *fts = fts_open(&path, FTS_PHYSICAL, NULL);
    FTSENT *fent;
    while ((fent = fts_read(fts)) != NULL)
        if (fent->fts_info == FTS_F)
            total_size += fent->fts_statp->st_size;
    fts_close(fts);
    return total_size;
}
*/
import "C"
import "fmt"

func main() {
    fmt.Println(C.get_total_size(C.CString("/usr")))
}
mug896
  • 1,777
  • 1
  • 19
  • 17
  • Are you sure about the call to `fts_open`? In your example it takes `&path`, and in this code `path` contains an address of the 1st byte in a block allocated by the `C.Cstring("/usr")` call—that is, points at `/`; so `fts_open` gets passed the address of the variable containing the address of `/` which looks like an error to me. Since I have no idea how the code of `fts_open` looks like, I'm just guessing. If it has lame signature with `void *` or `const void*` for its 1st argument, it will compile just OK. – kostix Aug 15 '21 at 16:55
  • OK, it turned out `fts_*` are from `glibc`, and yes, `fts_open` does indeed takes a pointer to a pointer as its first argument. Posted an answer. – kostix Aug 15 '21 at 17:35

2 Answers2

4

fts_open is defined like this:

fts_open()
The fts_open() function takes a pointer to an array of character pointers naming one or more paths which make up a logical file hierarchy to be traversed. The array must be terminated by a null pointer.

C does not have direct support for arrays; it only has pointers. In your case you pass fts_open a single valid pointer but it is not located in an array which has a NULL pointer as the immediately following element, so fts_open continues to scan the memory past &path — looking for a NULL pointer, — and eventually tries to read memory at some address it is forbidden to do so (usually because the page at that address was not allocated).

A way to fix it is to create that array and initialize it on the C side.
Looks like you're using a reasonably up-to-date standard of C, so let's just use direct literal to initialize the array:

package main

/*
#include <stddef.h> // for NULL
#include <stdint.h>
#include <stdlib.h> // for C.free
#include <fts.h>
#include <sys/stat.h>

uintmax_t get_total_size(char *path)
{
    uintmax_t total_size = 0;
    char * path_argv[2] = {path, NULL};
    FTS *fts = fts_open(path_argv, FTS_PHYSICAL, NULL);
    FTSENT *fent;
    while ((fent = fts_read(fts)) != NULL)
        if (fent->fts_info == FTS_F)
            total_size += fent->fts_statp->st_size;
    fts_close(fts);
    return total_size;
}
*/
import "C"

import (
    "fmt"
    "unsafe"
)

func main() {
    cpath := C.CString("/usr")
    defer C.free(unsafe.Pointer(cpath))
    fmt.Println(C.get_total_size(cpath))
}

Note that your program has one bug and one possible problem:

  • A bug is that the call C.CString allocates a chunk of memory by performing a call to malloc(3) from the linked C library, and you did not free that memory block.
  • The symbol NULL is defined in "stddef.h"; you might or might not get an error when compiling.

I've fixed both problems in my example.

A further improvement over our example might be leveraging the ability of fts_* functions to scan multiple paths in a single run; if we were to implement that, it would have more sense to allocate the array for the 1st argument of fts_open on the Go's side:

package main

/*
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>
#include <fts.h>
#include <sys/stat.h>

uintmax_t get_total_size(char * const *path_argv)
{
    uintmax_t total_size = 0;
    FTS *fts = fts_open(path_argv, FTS_PHYSICAL, NULL);
    FTSENT *fent;
    while ((fent = fts_read(fts)) != NULL)
        if (fent->fts_info == FTS_F)
            total_size += fent->fts_statp->st_size;
    fts_close(fts);
    return total_size;
}
*/
import "C"
import (
    "fmt"
    "unsafe"
)

func main() {
    fmt.Println(getTotalSize("/usr", "/etc"))
}

func getTotalSize(paths ...string) uint64 {
    argv := make([]*C.char, len(paths)+1)
    for i, path := range paths {
        argv[i] = C.CString(path)
        defer C.free(unsafe.Pointer(argv[i]))
    }

    return uint64(C.get_total_size(&argv[0]))
}

Note that here we did not explicitly zero out the last argument of argv because — contrary to C, — Go initializes each allocated memory block with zeroes, so once argv is allocated, all its memory is already zeroed.

kostix
  • 51,517
  • 14
  • 93
  • 176
  • if i run multiple times with goroutine then results in zero. why is that ? go foo(); go foo(); go foo(); func foo() { cpath := C.CString("/usr") defer C.free(unsafe.Pointer(cpath)) fmt.Println(C.get_total_size(cpath)) } – mug896 Aug 16 '21 at 07:38
  • 1
    @mug896, I doubt you can do this without heavily rethinking your approach: 95% of the C standard library assumes its code runs in a single thread, and in Go, with its so-called M×N scheduling (where M goroutines are freely scheduled on N threads, with N << M), breaks this. For instance, those `fts_*` functions appears to heavily rely on `errno` _which is a global variable._ – kostix Aug 16 '21 at 08:58
  • @mug896, is there any good reason why you need to use `fts_*` instead of Go's stock `path/filepath.Walk` + `os.Stat` (or `path/filepath.WalkDir`, if you're using Go 1.16+)? I mean, those `fts_*` do no real magic: they are just convenient wrappers around `opendir/readdir/closedir/lstat` (or their more low-level counterparts), and Go's standard library contains roughly the same building blocks for traversing directory hierarchies, and they are naturally safe for concurrent use. – kostix Aug 16 '21 at 09:00
  • I have tested the `filepath.Walk()` but many time slow than C, so i just testing cgo version. anyway thanks for the answers – mug896 Aug 16 '21 at 11:07
  • @mug896, I see. Possibly you can then employ the fact [`errno` can be made thread-local on POSIX systems](https://stackoverflow.com/a/1694170/720999) and pin your scanning goroutines to their undrlying threads via [`runtime.LockOSThread`](https://golang.org/pkg/runtime#LockOSThread). Still, I'd try to explore `filepath.WalkDir`: on each iteration, it returns a structure which already has the size of the entry; it was explicitly created to make FS traversals faster. – kostix Aug 16 '21 at 11:19
  • 1
    I have found a solution to the problem. it's not go's problem. the problem also exists in C with multi-threads. if i add `FTS_PHYSICAL | FTS_NOCHDIR` option in `fts_open()` function then problem disappears. the surprising is go run-time allocates distinct thread-id of all goroutines automatically – mug896 Aug 17 '21 at 09:42
  • @mug896, if you meant OS thread IDs, then you are not correct: Go's goroutines do not have such IDs because they are freely scheduled and rescheduled across a pool of OS threads available to the Go runtime powering any Go program (see my other comment). Supposedly what you're observing is an artefact of the fact a goroutine can only run its code while it's assigned an OS thread, and if a goroutines gets detached form a thread (say, to remain in a wait queue), the scheduler tries hard to subsequently put it back to the thread it was detached from… – kostix Aug 17 '21 at 10:58
  • @mug896, …So if the number of active goroutines is reasonably low, they might appear as having constant OS thread IDs. Don't be deluded by that though. You could read, say, https://morsmachine.dk/go-scheduler and https://rakyll.org/scheduler/ for more info on how the Go's M×N scheduler works. – kostix Aug 17 '21 at 11:00
  • You are right. i get thread-id in the C code above with pthread_self() function. if i run 400 goroutines that executing C code above then all have distinct thread-ids and return values are correct. if i run 1,000 goroutines then duplicates 148 thread-ids but return values are correct. if i run 10,000 goroutines then return values are incorrect and corrupted ( 4 core laptop ) – mug896 Aug 17 '21 at 12:10
1

you are getting the error cause "fts_open" requires a character pointer to an array which is NULL terminating like char *argv[] = { path, NULL };..(https://linux.die.net/man/3/fts_open)

package main

/*
#include <stdint.h>
#include <fts.h>
#include <sys/stat.h>

uintmax_t get_total_size(char *path)
{
    uintmax_t total_size = 0;
    char *argv[] = { path, NULL };
    FTS *fts = fts_open(argv, FTS_PHYSICAL, NULL);
    if (fts == NULL)
        return 0;
    FTSENT *fent;
    while ((fent = fts_read(fts)) != NULL)
        if (fent->fts_info == FTS_F)
            total_size += fent->fts_statp->st_size;
    fts_close(fts);
    return total_size;
}
*/
import "C"
import "fmt"

func main() {
    fmt.Println(C.get_total_size(C.CString("/usr")))
}

so adding the array pointer will fix the code.

The same code works when compiled with GCC but fts_open returns NULL.I am guessing that there is some difference in optimization between gcc and cgo(not very sure)

I tried some test results and was able to find that when compiling with GCC the char **pointer is getting NULL-terminated but in the case of cgo it was not getting NULL-terminated so you were getting "SIGSEGV" as your code is reading invalid memory reference

#include <stdio.h>
#include <string.h>

void try(char **p)
{
   while (*p != NULL)
   {
      printf("%zu\n", strlen(*p));
      ++p;
   }
}

void get_total_size(char *path)
{
   try(&path);
}
int main()
{
   get_total_size("/usr");
}

c code (which works)

package main
/*
#include <stdio.h>
#include <string.h>

void try(char **p)
{
   while (*p != NULL)
   {
      printf("%zu\n", strlen(*p));
      ++p;
   }
}

void get_total_size(char *path)
{
   try(&path);
}
*/
import "C"

func main() {
    C.get_total_size(C.CString("/usr"))
}

same go code you will face error

mug896
  • 1,777
  • 1
  • 19
  • 17
Siva Guru
  • 694
  • 4
  • 12