15

I have a trivial program:

int main(void)
{
  const char sname[]="xxx";
  sem_t *pSemaphor;
  if ((pSemaphor = sem_open(sname, O_CREAT, 0644, 0)) == SEM_FAILED) {
    perror("semaphore initilization");
    exit(1);
  }
  sem_unlink(sname);
  sem_close(pSemaphor);
}

When I run it under valgrind, I get the following error:

==12702== Syscall param write(buf) points to uninitialised byte(s)
==12702==    at 0x4E457A0: __write_nocancel (syscall-template.S:81)
==12702==    by 0x4E446FC: sem_open (sem_open.c:245)
==12702==    by 0x4007D0: main (test.cpp:15)
==12702==  Address 0xfff00023c is on thread 1's stack
==12702==  in frame #1, created by sem_open (sem_open.c:139)

The code was extracted from a bigger project where it ran successfully for years, but now it is causing segmentation fault.

The valgrind error from my example is the same as seen in the bigger project, but there it causes a crash, which my small example doesn't.

user2194898
  • 149
  • 6
  • Update. It is compiler version independent. I see it on libc-2.21 but not on libc-2.19 – user2194898 Feb 23 '16 at 13:58
  • 5
    Did you find out a solution? – amq May 28 '16 at 00:33
  • I'm also seeing this with libc-2.23 (Ubuntu 16.04). It appears when I run valgrind on my googletest-based unit tests. After a few of these, I get a segmentation fault. I don't get any of these errors or the segfault on OSX (same code). – davidA Mar 05 '18 at 04:00

1 Answers1

3

I see this with glibc 2.27-5 on Debian. In my case I only open the semaphores right at the start of a long-running program and it seems harmless so far - just annoying.

Looking at the code for sem_open.c which is available at: https://code.woboq.org/userspace/glibc/nptl/sem_open.c.html

It seems that valgrind is complaining about the line (270 as I look now):

if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t)))
      == sizeof (sem_t)

However sem.initsem is properly initialised earlier in a fairly baroque manner, firstly by explicitly setting fields in the sem.newsem (part of the union), and then once that is done by a call to memset (L226-228):

  /* Initialize the remaining bytes as well.  */
  memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
          sizeof (sem_t) - sizeof (struct new_sem));

I think that this particular shenanigans is all quite optimal, but we need to make sure that all of the fields of new_sem have actually been initialised... we find the definition in https://code.woboq.org/userspace/glibc/sysdeps/nptl/internaltypes.h.html and it is this wonderful creation:

struct new_sem
{
#if __HAVE_64B_ATOMICS
  /* The data field holds both value (in the least-significant 32 bytes) and
     nwaiters.  */
# if __BYTE_ORDER == __LITTLE_ENDIAN
#  define SEM_VALUE_OFFSET 0
# elif __BYTE_ORDER == __BIG_ENDIAN
#  define SEM_VALUE_OFFSET 1
# else
# error Unsupported byte order.
# endif
# define SEM_NWAITERS_SHIFT 32
# define SEM_VALUE_MASK (~(unsigned int)0)
  uint64_t data;
  int private;
  int pad;
#else
# define SEM_VALUE_SHIFT 1
# define SEM_NWAITERS_MASK ((unsigned int)1)
  unsigned int value;
  int private;
  int pad;
  unsigned int nwaiters;
#endif
};

So if we __HAVE_64B_ATOMICS then the structure has a data field which contains both the value and the nwaiters, otherwise these are separate fields.

In the initialisation of sem.newsem we can see that these are initialised correctly, as follows:

#if __HAVE_64B_ATOMICS
      sem.newsem.data = value;
#else
      sem.newsem.value = value << SEM_VALUE_SHIFT;
      sem.newsem.nwaiters = 0;
#endif
      /* pad is used as a mutex on pre-v9 sparc and ignored otherwise.  */
      sem.newsem.pad = 0;
      /* This always is a shared semaphore.  */
      sem.newsem.private = FUTEX_SHARED;

I'm doing all of this on a 64-bit system, so I think that valgrind is complaining about the initialisation of the 64-bit sem.newsem.data with a 32-bit value since from:

  value = va_arg (ap, unsigned int);

We can see that value is defined simply as an unsigned int which will usually still be 32 bits even on a 64-bit system (see What should be the sizeof(int) on a 64-bit machine?), but that should just be an implicit cast to 64-bits when it is assigned.

So I think this is not a bug - just valgrind getting confused.

karora
  • 1,223
  • 14
  • 31
  • I haven't observed a crash here - can you get it to happen with gdb? What architecture are you building on? If you can build glibc you could try to add a `memset(&sem, 0, sizeof(sem));` before any of the initialisation just to confirm that resolved the valgrind message. – karora Sep 06 '18 at 10:25
  • Please read meowsqueak's comment above. In my case it was Debian 8 system and I do not have any more given glibc version. Not even the computer where it was crashing ... :( Sorry. – user2194898 Sep 06 '18 at 14:17
  • I took a look at the current glibc 2.27 code from Debian and the sem_open is subtly different from the links above. In that version it is possible for `sem.newsem.pad` not to be initialised in the `__HAVE_64B_ATOMICS` path but it comes with the note `/* pad is used as a mutex on pre-v9 sparc and ignored otherwise. */` which strongly suggests it would not be an issue. I've run my own code under the debugger, including writing lots of junk into the stack before the call, and can confirm that the structure is initialised just fine. If @meowsqueak gets a crash I wouldn't blame this. – karora Sep 06 '18 at 17:04