Summary
I have worked out a solution to the issue of this question.
Basically, the callee (wallpaper
) was not itself exiting because it was waiting on another process to finish.
Over the course of 52 days, this problematic side effect had snowballed until 10,000+ lingering processes were consuming 10+ gigabytes of RAM, almost crashing my system.
The offending process turned out to be a call to printf from a function called log
that I had sent into the background and forgotten about, because it was writing to a pipe and hanging.
As it turns out, a process writing to a named pipe will block until another process comes along and reads from it.
This, in turn, changed the requirements of the question from "I need a way to stop these processes from building up" to "I need a better way of getting around FIFO I/O than throwing it to the background".
Note that while the question has been solved, I'm more than happy to accept an answer that goes into detail on the technical level. For example, the unsolved mystery of why the caller script's (wallpaper-run
) process was being duplicated as well, even though it was only called once, or how to read a pipe's state information proper, rather than relying on open
's failure when called with O_NONBLOCK
.
The original question follows.
The Question
I have two bash scripts meant to run in a loop. The first, wallpaper-run
, runs in an infinite loop and calls the second, wallpaper
.
They are part of my "desktop", which is a bunch of hacked together shell scripts augmenting the dwm
window manager.
wallpaper-run:
log "starting wallpaper runner"
while true; do
log "..."
$scr/wallpaper
sleep 900 # 15 minutes
done &
wallpaper:
log "changing wallpaper"
# several utility functions ...
if [[ $1 ]]; then
parse_arg $1
else
load_random
fi
Some notes:
log
is an exported function frominit
, which, as its name suggests, logs a message.init
callswallpaper-run
(among other things) in its foreground (hence the while loop being in the background)$scr
is also defined by init; it is the directory where so-called "init-scripts" goparse_arg
andload_random
are local towallpaper
in particular, images are loaded into the background via the program
feh
The manner in which wallpaper-run is loaded is as such:
$mod/wallpaper-run
init is called directly by
startx
, and starts dwm before it runs wallpaper-run (and the other "modules")
Now on to the problem, which is that for some reason, both wallpaper-run and wallpaper "linger" in memory. That is to say that after each iteration of the loop, two new instances of wallpaper and wallpaper-run are created, while the "old" ones don't get cleaned up and get stuck in sleep status. It's like a memory leak, but with lingering processes instead of bad memory management.
I found out about this "process leak" after having my system up for 52 days when everything broke ( something like bash: cannot fork: resource temporarily unavailable
spammed the terminal whenever I tried to run a command ) because the system ran out of memory. I had to kill over 10,000 instances of wallpaper/run to bring my system back to working order.
I have absolutely no idea why this is the case. I see no reason for these scripts to linger in memory because a script exiting should mean that its process gets cleaned up.
Why are they lingering and eating up resources?
Update 1
With some help from the comments (much thanks to I'L'I), I've traced the problem to the function log
, which makes background calls to printf (though why I chose to do that, I don't recall). Here is the function as it appears in init:
log(){
local pipe=$pipe_front
if ! [[ -p $pipe ]]; then
mkfifo $pipe
fi
printf ... >> $initlog
printf ... > $pipe &
printf ... &
[[ $2 == "-g" ]] && notify-send "[DWM Init] $1"
sleep 0.001
}
As you can see, the function is very poorly written. I hacked it together to make it work, not to make it robust.
The second and third printf are sent to the background. I don't recall why i did this, but it's presumably because the first printf must have been making log hang.
The printf lines have been abridged to "...", because they are fairly complex and not relevant to the issue at hand (And also I have better things to do with 40 minutes of my time than fighting with Android's garbage text input interface). In particular, things like the current time, name of the calling process, and the passed message are printed, depending on which printf we're talking about. The first has the most detail because it's saved to a file where immediate context is lost, while the notify-send line has the least amount of detail because it's going to be displayed on the desktop.
The whole pipe debacle is for interfacing directly with init via a rudimentary shell that I wrote for it.
The third printf is intentional; it prints to the tty that I log into at the beginning of a session. This is so that if init suddenly crashes on me, I can see a log of what went wrong. Or at least what was happening before it crashed
I'm including this in the question because this is the root cause of the "leak". If I can fix this function, the issue will be resolved.
The function needs to log the messages to their respective sources and halt until each call to printf finishes, but it also must finish within a timely manner; hanging for an indefinite period of time and/or failing to log the messages is unacceptable behavior.
Update 2
After isolating the log
function (see update 1) into a test script and setting up a mock environment, I've boiled it down to printf.
The printf call which is redirected into a pipe,
printf "..." > $pipe
hangs if nothing is listening to it, because it's waiting for a second process to pick up the read end of the pipe and consume the data. This is probably why I had initially forced them into the background, so that a process could, at some point, read the data from the pipe while, in the immediate case, the system could move on and do other things.
The call to sleep, then, was a not-well-thought-out hack to work around data race problems resulting from one reader trying to read from multiple writers simultaneously. The theory was that if each writer had to wait for 0.001 seconds (despite the fact that the printf
in the background has nothing to do with the sleep
following it), somehow, that would make the data appear in order and fix the bug. Of course, looking back, that really does nothing useful.
The end result is several background processes hanging on to the pipe, waiting for something to read from it.
The answer to "Prevent hanging of "echo STRING > fifo" when nothing..." presents the same "solution" that caused the bug that spawned this question. Obviously incorrect. However, an interesting comment by user R..
mentioned something about fifos containing state which includes information such as what processes are reading the pipe.
Storing state? You mean the absence/presence of a reader? That's part of the state of the fifo; any attempt to store it outside would be bogus and would be subject to race conditions.
Obtaining this information and refusing to write if there is no reader is the key to solving this.
However, no matter what I search for on Google, I can't seem to find anything about reading the state of a pipe, even in C. I am perfectly willing to use C if need be, but a bash solution (or an existing core util) would be preferred.
So now the question becomes: how in the heck do I read the state information of a FIFO, particularly the process(es) who has (have) the pipe open for reading and/or writing?