2

I have 2 program compiled in 4.1.2 running in RedHat 5.5 , It is a simple job to test shared memory , shmem1.c like following :

#define STATE_FILE "/program.shared"
#define  NAMESIZE 1024
#define   MAXNAMES 100
typedef struct
{
    char name[MAXNAMES][NAMESIZE];
    int heartbeat ;
    int iFlag ;
}  SHARED_VAR;

int main (void)
{
    int first = 0;
    int shm_fd;
    static SHARED_VAR *conf;

    if((shm_fd = shm_open(STATE_FILE, (O_CREAT | O_EXCL | O_RDWR),
                   (S_IREAD | S_IWRITE))) > 0 ) {
        first = 1; /* We are the first instance */
    }
    else if((shm_fd = shm_open(STATE_FILE, (O_CREAT | O_RDWR),
                    (S_IREAD | S_IWRITE))) < 0) {
        printf("Could not create shm object. %s\n", strerror(errno));
        return errno;
    }
    if((conf =  mmap(0, sizeof(SHARED_VAR), (PROT_READ | PROT_WRITE),
               MAP_SHARED, shm_fd, 0)) == MAP_FAILED) {

        return errno;
    }
    if(first) {
        for(idx=0;idx< 1000000000;idx++)
        {
            conf->heartbeat = conf->heartbeat + 1 ;
        }
    }
    printf("conf->heartbeat=(%d)\n",conf->heartbeat) ;
    close(shm_fd);
    shm_unlink(STATE_FILE);
    exit(0);
}//main

And shmem2.c like following :

#define STATE_FILE "/program.shared"
#define  NAMESIZE 1024
#define   MAXNAMES 100

typedef struct
{
    char name[MAXNAMES][NAMESIZE];
    int heartbeat ;
    int iFlag  ;
}  SHARED_VAR;

int main (void)
{
    int first = 0;
    int shm_fd;
    static SHARED_VAR *conf;

    if((shm_fd = shm_open(STATE_FILE, (O_RDWR),
                    (S_IREAD | S_IWRITE))) < 0) {
        printf("Could not create shm object. %s\n", strerror(errno));
        return errno;
    }
    ftruncate(shm_fd, sizeof(SHARED_VAR));
    if((conf =  mmap(0, sizeof(SHARED_VAR), (PROT_READ | PROT_WRITE),
               MAP_SHARED, shm_fd, 0)) == MAP_FAILED) {
        return errno;
    }
    int idx ;
    for(idx=0;idx< 1000000000;idx++)
    {
        conf->heartbeat = conf->heartbeat + 1 ;
    }
    printf("conf->heartbeat=(%d)\n",conf->heartbeat) ;
    close(shm_fd);
    exit(0);
}

After compiled :

   gcc shmem1.c -lpthread -lrt -o shmem1.exe
   gcc shmem2.c -lpthread -lrt -o shmem2.exe

And Run both program almost at the same time with 2 terminal :

   [test]$ ./shmem1.exe
   First creation of the shm. Setting up default values
   conf->heartbeat=(840825951)
   [test]$ ./shmem2.exe
   conf->heartbeat=(1215083817)

I feel confused !! since shmem1.c is a loop 1,000,000,000 times , how can it be possible to have a answer like 840,825,951 ?

I run shmem1.exe and shmem2.exe this way,most of the results are conf->heartbeat will larger than 1,000,000,000 , but seldom and randomly , I will see result conf->heartbeat will lesser than 1,000,000,000 ,
either in shmem1.exe or shmem2.exe !!

if run shmem1.exe only , it is always print 1,000,000,000 , my question is , what is the reason cause conf->heartbeat=(840825951) in shmem1.exe ?

Update: Although not sure , but I think I figure it out what is going on , If shmem1.exe run 10 times for example , then conf->heartbeat = 10 , in this time shmem1.exe take a rest and then back , shmem1.exe read from shared memory and conf->heartbeat = 8 , so shmem1.exe will continue from 8 , why conf->heartbeat = 8 ? I think it is because shmem2.exe update the shared memory data to 8 , shmem1.exe did not write 10 back to shared memory before it took a rest ....that is just my theory... i don't know how to prove it !!

barfatchen
  • 1,630
  • 2
  • 24
  • 48
  • Where you define idx in the first program? Paste the complete code, please. – louxiu Dec 12 '12 at 04:07
  • You probably should use a mechanism to protect your shared memory from multiple simultaneous access you need either mutex or semaphores: http://stackoverflow.com/a/12468183/1634695 – Mehdi Karamosly Dec 13 '12 at 20:42

1 Answers1

4

The values you are getting back indicate that you are not incrementing the shared memory atomically. The following loop:

int idx ;
for(idx=0;idx< 1000000000;idx++)
{
    conf->heartbeat = conf->heartbeat + 1 ;
}

boils down to this:

int idx ;
for(idx=0;idx< 1000000000;idx++)
{
    // read
    int heartbeat= conf->heartbeat;

    // write
    conf->heartbeat = heartbeat + 1 ;
}

In between the read and write comments, a process can be swapped out to let another process run. If shmem1.exe and shmem2.exe are both running, that means that you can have shmem1.exe incrementing conf->heartbeat many times in between shmem2.exe reading and writing conf->heartbeat, or vice versa.

If you want a consistent update, you need to use your platform's atomic memory increment functions. That guarantees that the read/modify/write operation always results in incrementing the value, rather than potentially writing back a stale value.

For example, without any synchronization between shmem1.exe and shmem2.exe, you could have this pathological case that has shmem1.exe and shmem2.exe both outputting 2:

shmem1.exe: read 0
shmem2.exe: read 0
// shmemem2.exe goes to sleep for a loooong time
shmem1.exe: write 1
// ... shmem1.exe keeps running
shmem1.exe: write 999,999,999
// shmem2.exe wakes up
shmem2.exe write 1
shmem2.exe read 1
// shmem2.exe goes back to sleep
shmem1.exe read 1(!)
// shmem1.exe goes to sleep
// shmem2.exe wakes up
shmem2.exe write 2
shmem2.exe read 2
shmem2.exe write 3
// shmem2.exe continues, shmem1.exe stays asleep
shmem2.exe read 999,999,999
shmem2.exe write 1,000,000,000
// shmem2.exe goes to sleep, shmem1.exe wakes up
shmem1.exe write 2(!)
shmem1.exe read 2
shmem1.exe print 2
//shmem2.exe wakes up
shmem2.exe read 2
shmem2.exe print 2

This can happen without CPU reordering, just scheduling madness.

MSN
  • 53,214
  • 7
  • 75
  • 105
  • Thanks , I got your point, after I add spinlock to both , one of shmem1.exe or shmem2.exe end will get 2,000,000,000 and that is perfect , what confuse me is that , even I don't have atomic operation to conf->heartbeat , how can shmem1.exe get a number less than 1,000,000,000 ? I expect at least more than 1,000,000,000 – barfatchen Dec 12 '12 at 04:25
  • I add asm volatile("" ::: "memory") after conf->heartbeat = conf->heartbeat + 1 still will get number less than 1,000,000,000 ,but if I add asm volatile("mfence" ::: "memory") instead , then never have a answer in shmem1.exe and shmem2.exe less than 1,000,000,000 , so I wonder if this a case of cpu reordering ? – barfatchen Dec 12 '12 at 04:35
  • @barfatchen, I updated my answer with a scenario under which you can get shmem1.exe and shmem2.exe to output 2. – MSN Dec 13 '12 at 02:58
  • shouldn't be there a crash if two process are accessing the same memory to write into it at the same time ? – Mehdi Karamosly Dec 13 '12 at 20:17
  • @MSN I agree with what you said about atomic RWM. In addition to that issue, I thought in order to make the compiler completely aware that we are accessing shared memory via conf, we need to declare conf as `volatile SHARED_VAR*`; otherwise, the compiler might optimize it too much that it doesn't attempt to read or write to main memory. I ran into such issue when I don't read via a `volatile variable`. [link](https://stackoverflow.com/questions/51168908/fail-to-read-through-shared-memory) – HCSF Aug 12 '18 at 13:50