2

I have a program that copies buffers to files, mmap's them back and then checks their contents. Multiple threads can work on the same file. Occasionally, I am getting SIGBUS when reading, but only under load.

The mappings are MAP_PRIVATE and MAP_POPULATE. The crash via SIGBUS occurs after mmap was successful which I do not understand since MAP_POPULATE was used.

Here is a full example (creates files under /tmp/buf_* filled with zeroes), using OpenMP to create more load and concurrent writes:

// Program to check for unexpected SIGBUS
// gcc -std=c99 -fopenmp -g -O3 -o mmap_manymany mmap_manymany.c
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define NBUFS 64
const char bufs[NBUFS][65536] = {{0}};
const char zeros[65536] = {0};

int main()
{
  int count = 0;
  while ( 1 )
  {
    void *mappings[ 1000 ] = {NULL};

#pragma omp parallel for
    for ( int i = 0; i < 1000; ++i )
    {
      // Prepare filename
      int bufIdx = i % NBUFS;
      char path[ 128 ] = { 0 };
      sprintf( path, "/tmp/buf_%0d", bufIdx );

      // Write full buffer
      int outFd = -1;
#pragma omp critical
      {
        remove( path );
        outFd = open( path, O_EXCL | O_CREAT | O_WRONLY | O_TRUNC, 0644 );
      }
      assert( outFd != -1 );
      ssize_t size = write( outFd, bufs[bufIdx], 65536 );
      assert( size == 65536 );
      close( outFd );

      // Map it to memory
      int inFd = open( path, O_RDONLY );
      if ( inFd == -1 )
        continue; // Deleted by other thread. Nevermind

      mappings[i] = mmap( NULL, 65536, PROT_READ, MAP_PRIVATE | MAP_POPULATE, inFd, 0 );
      assert( mappings[i] != MAP_FAILED );
      close( inFd );

      // Read data immediately. Creates occasional SIGBUS but only under load.
      int v = memcmp( mappings[i], zeros, 65536 );
      assert( v == 0 );
    }

    // Clean up
    for ( int i = 0; i < 1000; ++i )
      munmap( mappings[ i ], 65536 );
    printf( "count: %d\n", ++count );
  }
}

No assert fires for me, but the program always crashes after a few seconds with SIGBUS.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
bking
  • 275
  • 2
  • 11
  • May be related to unanswered https://stackoverflow.com/q/21230720/3766665 but I am not using a shared mapping – bking Jun 14 '17 at 10:59
  • *Where* do you get the crash? You have a couple of statements and expressions after your `mmap` call that could be the cause. Please use a debugger to narrow it down. And have you tried *without* OpenMP? Does it work then? – Some programmer dude Jun 14 '17 at 11:02
  • It crashes directly in the memcmp call. The kernel sends SIGBUS from do_page_fault which I found with perf like this: sudo perf record -g -e signal:signal_generate ./mmap_manymany – bking Jun 14 '17 at 11:09
  • It does not happen without OpenMP. – bking Jun 14 '17 at 11:10

1 Answers1

1

With your current program, it can happen that thread 0 creates /tmp/buf_0, writes to it and closes it. Then thread 1 removes and creates /tmp/buf_0, but before thread 1 writes to it, thread 0 opens, maps, and reads from /tmp/buf_0 - and thus tries to access a file does not yet contain 64 kiB data. You get a SIGBUS.

To avoid that issue, just make unique files / and bufs for each thread, by using omp_get_thread_num() instead of bufIdx.

Zulan
  • 21,896
  • 6
  • 49
  • 109
  • Yes! This explanation makes perfect sense. I can not name my files differently, but I can open them with `O_RDWR` and use that file descriptor directly in my `mmap`. This avoids the race you described. – bking Jun 14 '17 at 13:05
  • Still unclear to me: The example code fails in `memcmp`. Why not within `mmap` due to `MAP_POPULATE`? – bking Jun 14 '17 at 13:12
  • For some reason, Linux ignores errors on populating memory pages... https://github.com/torvalds/linux/blob/master/include/linux/mm.h#L2141 – Zulan Jun 14 '17 at 14:02