2

i'm trying to make a single instance daemon using a lock file but fcntl() doesn't seem to work as expected...

int creat_lock_file (char * pid_fn)
{
  struct flock pid_lck = {F_WRLCK, SEEK_SET,   0,      0,     0 };

  /* Open/Create pid file */
  int pid_fd = open (pid_fn, O_CREAT | O_WRONLY, 0640);
  if (pid_fd == -1)
  {
    syslog (LOG_ERR, "Unable to open PID file > [Errno: %s]",    strerror(errno));
    return -1;
  }

  /* Place write lock on pid file */
  if (fcntl(pid_fd, F_SETLK, &pid_lck) == -1) {
    /* Unhandled error ocured */
    syslog (LOG_ERR, "Unhandled error ocured while locking PID file > [Errno: %s]", strerror(errno));
    close (pid_fd);
    return -1;
  }

  /* Write PID to lock file */
  char pid_lock_buf[11];
  sprintf (pid_lock_buf, "%ld\n", (long) getpid ());
  write (pid_fd, pid_lock_buf, strlen (pid_lock_buf)+1);

  return 0;
}

int get_lock_file_status (char * pid_fn)
{
  struct flock pid_lck = {F_WRLCK, SEEK_SET,   0,      0,     0 };

  /* Open/Create pid file */
  int pid_fd = open (pid_fn, O_CREAT | O_WRONLY, 0640);
  if (pid_fd == -1)
  {
    syslog (LOG_ERR, "Unable to open PID file > [Errno: %s]", strerror(errno));
    return -1;
  }

  /* try locking pid file*/
  if(fcntl(pid_fd, F_GETLK, &pid_lck) == -1)
  {
    if(errno == EAGAIN || errno == EACCES) /* file already locked, close fd and return -1 */
    {
      close (pid_fd);
      return -1;
    }
    else /* Unhandled error ocured */
    {
      syslog (LOG_ERR, "Unhandled error ocured while locking PID file > [Errno: %s]", strerror(errno));
      close (pid_fd);
      return -1;
    }
  }

  close (pid_fd);
  return 0;
}

so i call get_lock_file_status and exit if it returns -1 to make sure no other instance is running than i do some stuff (fork chdir etc) and call creat_lock_file to crate and lock a pid file after successfully creating a daemon...

when complied and run the program works as expected runs creates lock file and writes pid to it but when a second instance is started the second instance simply opens the same lock file and writes it's own pid to it!

what am i doing wrong? shouldn't the second instance return -1 in get_lock_file_status?

Unheilig
  • 16,196
  • 193
  • 68
  • 98
  • this seems to be the wrong approach. for programming a file lock, the flock() function would seem to be the right approach, This function, on linux, is detailed in 'man 2 flock' or at : – user3629249 Mar 14 '15 at 01:00
  • @user3629249: `fcntl(2)` with `F_SET/GET/LK` is fine, and also POSIX, which `flock(2)` isn't (though it's widely available). The two methods have different behavior w.r.t. lock inheritance and some other things. `fcntl()` can set locks on file regions, which `flock()` can't. – Ulfalizer Mar 14 '15 at 08:17

1 Answers1

1

You're checking the result of F_GETLK in the wrong way. fcntl(2) with F_GETLK only returns -1 on errors. The correct way to check if it would be possible to acquire the lock is to check if the l_type field of the struct flock is set to F_UNLCK, like in the following:

/* try locking pid file*/
if(fcntl(pid_fd, F_GETLK, &pid_lck) == -1) {
    syslog (LOG_ERR, "Unhandled error ocured while locking PID file > [Errno: %s]", strerror(errno));
    close(pid_fd);
    return -1;
}
close(pid_fd);
return (pid_lck.l_type == F_UNLCK) ? 0 : -1;

It should be possible to roll creat_lock_file() and get_lock_file_status() into a single function that opens the file (and creates it if it doesn't exist), tries to set a lock on it, and returns whether the locking was successful (e.g., either a file descriptor or -1).

You should truncate(2) the PID file to zero bytes before writing the PID into it by the way. Say the PID of your process is 5 and the old PID in the PID file is 123. Writing "5" would then make the contents of the PID file "523". Truncating the file to zero bytes solves this problem. (O_TRUNC won't work since you'd be clearing the file when opening it to test if the lock is set.)

Removing the PID file with unlink(2) when the program exits is pretty common too. That way the non-existence of the file indicates the daemon isn't running (though it's not foolproof as the process or system could crash).

Ulfalizer
  • 4,664
  • 1
  • 21
  • 30
  • Hi and thanks for the answer, as i read your answer i thought of something you said testing if a file exists is not a good idea cause the program may crass and not have time to `unlink()` that file, but what if i read the file and try to look up the pid to see if any process runs with the same pid? is that a good idea? – Aristos Miliaressis Mar 14 '15 at 09:33
  • @AristosMiliaressis: Using file locking to check if the daemon is running will work even if the daemon crashes and doesn't remove the PID file. If the daemon crashes, then the file lock is removed automatically (because it's tied to the process), and the new daemon instance will be able to lock the file (which tells it that no other instance is running). – Ulfalizer Mar 14 '15 at 09:38
  • For checking if the daemon is really running from e.g. the shell, checking whether there's a process with the same PID as in the file should work, yeah. It *could* fail if the daemon crashes and some other process happens to get the same PID, but that that would be pretty uncommon. (And you could check what the command that started the process was with `ps(1)` too to make sure.) – Ulfalizer Mar 14 '15 at 09:42
  • I've never done file locking from the shell myself, but that should be possible too (if you want a shell script to do the same thing your code does for example). See this answer for example: http://stackoverflow.com/questions/1715137/the-best-way-to-ensure-only-1-copy-of-bash-script-is-running. – Ulfalizer Mar 14 '15 at 09:50
  • @AristosMiliaressis: Did you find anything? I edited the answer a bit. I'll check back tomorrow and make sure the code works on my machine (though it ought to from looking at it, provided you don't call `get_lock_file_status()` again after calling `creat_lock_file()`). – Ulfalizer Mar 14 '15 at 15:13
  • @AristosMiliaressis: I tried out your code and found the actual problem you're running into. See the updated answer. I ought to guess less sometimes. :) – Ulfalizer Mar 15 '15 at 07:05