2

I'm working on a platform running MontaVista Linux 3.1.

I have a C++ application, which for esoteric reasons which I won't go into, has to remount the JFFS2 flash file system quite regularly between read-only and read-write.

When you perform a int mount(...) call, specified in sys/mount.h, to set the file system read-write, the jffs2_gcd_mtd0 garbage collector process gets kicked off as you would expect. However, when you repeat the mount call to go back to read-only, jffs2_gcd_mtd0 gets killed, and becomes a defunct process.

After a few minutes, we end up with a shed load of defunct jffs2_gcd_mtd0 processes, which no matter what we do, we can't get rid of.

I can replicate the problem with the following test app:

int main()
{
  while(true)
  {
    mount("/dev/mtdblock/0", "flash", "", MS_REMOUNT|MS_POSIXACL|MS_ACTIVE|MS_NOUSER|0XEC0000, "");
    sleep(1);
    mount("/dev/mtdblock/0", "flash", "", MS_RDONLY|MS_REMOUNT|MS_POSIXACL|MS_ACTIVE|MS_NOUSER|0XEC0000, "");
    sleep(1);
  }
}

I have tried various method to reap the defunct processes: setting signal(SIGCHLD, SIG_IGN) (doesn't work); calling wait(int) after the set to read-only (fails, with errno going to 10 - "No child processes"); calling kill(0, SIGCHLD) (doesn't work).

Am I correct in assuming this is a bug in the mount implementation we have? Given that this is a bug, how could I remove the defunct processes, and stop the process ID table from filling up?


Some supplementary info: this problem doesn't seem to occur when I run the test app with strace. Now I'm getting really stumped!

Matt Dunn
  • 5,106
  • 6
  • 31
  • 55
  • Sounds like a bug in the jffs2 code. I wonder if doing a umount() before the mount() call would help avoid it? – Jeremy Friesner Feb 04 '13 at 02:44
  • 1
    `strace` slows things down so it could be a race condition in jffs2 filesystem driver code. Try compiling the latest jffs2 driver from https://github.com/torvalds/linux/tree/master/fs/jffs2. Do you see any errors in the logs of `dmesg` output? – Austin Phillips Feb 04 '13 at 12:55
  • Thanks for the comments guys. Unfortunately calling `umount()` first doesn't make a scrap of difference I'm afraid, and `dmesg` doesn't show any useful output either. I think I've found a workaround, which I'll outline below, so think I can live with this bug. But if I continue to have problems, then I'll try and recompile the jffs2 driver. – Matt Dunn Feb 05 '13 at 14:37
  • Any reason you don't post this issue on the kernel of jffs2 mailing list? – Maxim Egorushkin Feb 05 '13 at 15:57

1 Answers1

3

As a workaround, I have found that calling the mount() command from within a pthread allows the defunct jffs2_gcd_mtd0 processes to be reaped.

I believe this is working via the following mechanism: when the thread joins, the spawned jffs2_gcd_mtd0 process is left without a parent. It therefore gets inherited by init, which then gets reaped when it is finished.

If anyone would like to correct/expand on my explanation above, please do!

Matt Dunn
  • 5,106
  • 6
  • 31
  • 55