10

I've have to scripts:

#!/bin/bash

netcat -lk -p 12345 | while read line
do
    match=$(echo $line | grep -c 'Keep-Alive')
    if [ $match -eq 1 ]; then
        [start a command]
    fi
done

and

#!/bin/bash

netcat -lk -p 12346 | while read line
do
    match=$(echo $line | grep -c 'Keep-Alive')
    if [ $match -eq 1 ]; then
        [start a command]
    fi
done

I've put the two scripts in the '/etc/init.d/'

When I restart my Linux machine (RasbPi), both the scripts work fine.

I've tried them like 20 times, and they keep working fine.

But after around 12 hours, the whole system stops working. I've put in some loggin, but it seems that the scripts are not reacting anymore. But when I;

ps aux

I can see that the scripts are still running:

root      1686  0.0  0.2   2740  1184 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script1.sh start
root      1689  0.0  0.1   2268   512 ?        S    Aug12   0:00 netcat -lk 12345
root      1690  0.0  0.1   2744   784 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script1.sh start
root      1691  0.0  0.2   2740  1184 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script2.sh start
root      1694  0.0  0.1   2268   512 ?        S    Aug12   0:00 netcat -lk 12346
root      1695  0.0  0.1   2744   784 ?        S    Aug12   0:00 /bin/bash /etc/init.d/script2.sh start

After a reboot they start working again... But thats a sin, rebooting a Linux machine periodically...

I've inserted some loggin, here's the outcome;

Listening on [0.0.0.0] (family 0, port 12345)
[2013-08-14 11:55:00] Starting loop.
[2013-08-14 11:55:00] Starting netcat.
netcat: Address already in use
[2013-08-14 11:55:00] Netcat has stopped or crashed.
[2013-08-14 11:49:52] Starting loop.
[2013-08-14 11:49:52] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6333)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6334)
[2013-08-14 12:40:02] Starting loop.
[2013-08-14 12:40:02] Starting netcat.
netcat: Address already in use
[2013-08-14 12:40:02] Netcat has stopped or crashed.
[2013-08-14 12:17:16] Starting loop.
[2013-08-14 12:17:16] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6387)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6388)
[2013-08-14 13:10:08] Starting loop.
[2013-08-14 13:10:08] Starting netcat.
netcat: Address already in use
[2013-08-14 13:10:08] Netcat has stopped or crashed.
[2013-08-14 12:17:16] Starting loop.
[2013-08-14 12:17:16] Starting netcat.
Listening on [0.0.0.0] (family 0, port 12345)
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6167)
Connection closed, listening again.
Connection from [16.8.94.19] port 12345 [tcp/*] accepted (family 2, sport 6168)

Thanks

Dennis
  • 1,528
  • 2
  • 16
  • 31
  • can't see an issue, but I don't know much about netcat. BUT you can reduce the number of processes your creating by replacing `match=...fi` with `do ; if grep -q 'Keep-Alive' ; then start cmd; fi`. Good luck. – shellter Aug 07 '13 at 14:05
  • I've just tried that, but that stops everything from working... – Dennis Aug 07 '13 at 14:20
  • 3
    +1 "But that's a sin". I suspect, especially in light of the `-k` keep-alive flag on netcat, that the IP layer is bouncing after many hours either through DHCP lease expiration or "self-healing (i.e. reboot daily because it's easier than fixing bugs)" features of your etherswitch. Does `/var/log/syslog` give you any clue? – msw Aug 07 '13 at 14:25
  • Sorry that didn't work for you. As tech support is so fond of say "It works for me" ;-> If you want to add an edit to your post showing the exact code, I'll be happy to look at that. ...... I agree with msw, especially about looking in /var/log/syslog (as I don't know that much about netcat). Good luck! – shellter Aug 07 '13 at 14:43
  • Good point on the DHCP lease time... I'm gonna test that... My lease time is on 24 hours. What would be a proper solution? (I've checked the /var/log/syslog but I can't find anything, uuuhhh, usefull. But then again, I'm not really sure what I should notice... (I'm not that good with Linux) – Dennis Aug 07 '13 at 14:45
  • 1
    I wonder if placing it on a loop that that sleeps about 4s before restarting netcat would be a good workaround. But of course it's still important that you know the real cause of it. Probably it's not really related to netcat but the interface itself or outside connections. – konsolebox Aug 07 '13 at 14:57
  • I'm gonna see if it stops working after exactly 24 hours, then I'm pretty sure it's the DHCP lease. How would that 4 second loop exactly look like? thanks! – Dennis Aug 07 '13 at 15:08
  • I've did a dhclient -r and dhclient But it's still working... :/ – Dennis Aug 07 '13 at 15:11
  • I've ruled out DHCP, thats not it... – Dennis Aug 11 '13 at 12:35
  • If you have `strace` on your raspi you can try to attach to your process when it gets stuck and see what it's doing. Side note, bash has builtin regex matching operator `=~` so you don't need the `echo`+`grep` pair. – Jester Aug 11 '13 at 13:54
  • Jester please tell me how exactly, I'm not that good with linux... I was thinking about something like that using screen. But i'm a linux noob, would you be so kind to show me how?, thnx – Dennis Aug 11 '13 at 14:27
  • I think it would be interesting to see the output from the netcat sessions. Try adding the -v flag and piping error output to file. Should be something like `netcat -vlk 12345 2>>/var/netcaterr.out | while read line` ... After it has stopped working, have a look in the /var/netcaterr.out and see what you find. – Bex Aug 12 '13 at 08:51

6 Answers6

6

If none of your commands including netcat reads input from stdin you can completely make it run independent of the terminal. Sometimes background process that are still dependent on the terminal pauses (S) when they try to read input from it on a background. Actually since you're running a daemon, you should make sure that none of your commands reads input from it (terminal).

#!/bin/bash

set +o monitor # Make sure job control is disabled.

(
    : # Make sure the shell runs a subshell.
    exec netcat -lk -p 12345 | while read line  ## Use exec to overwrite the subshell.
    do
        match=$(echo $line | grep -c 'Keep-Alive')
        if [ $match -eq 1 ]; then
            [start a command]
        fi
    done
) <&- >&- 2>&- </dev/null &>/dev/null &

TASKPID=$!
sleep 1s ## Let the task initialize a bit before we disown it.
disown "$TASKPID"

And I think we could try the logging thing again:

set +o monitor

(
    echo "[$(date "+%F %T")] Starting loop with PID $BASHPID."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -vv -lk -p 12345 | while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
) <&- >&- 2>&- </dev/null >> "/var/log/something.log" 2>&1 &

TASKPID=$!
sleep 1s
disown "$TASKPID"
konsolebox
  • 72,135
  • 12
  • 99
  • 105
  • So I should try the second one?, with the loggin, but should that not also include the 'exec' to run in a subshell? – Dennis Aug 14 '13 at 07:35
  • Honestly I'm not sure if logging would again cause the scripts to stop so perhaps I could suggest that you try the one with logging first then try the one without it after. About exec it won't be a problem don't worry since () would already be separated from its parent shell as a whole itself, and hopefully from the attributes of the terminal as well. If it still doesn't work I would suggest using a different netcat like the original netcat or the gnu-netcat otherwise. – konsolebox Aug 14 '13 at 07:50
  • Okey, I'll try the second one, with loggin. Results tomorrow :) – Dennis Aug 14 '13 at 09:22
  • Results; it stops working after aprox one hour; last line in the log: Connection from x.x.x.x port 12345 [tcp/*] accepted (family2, sport 6386) – Dennis Aug 14 '13 at 11:08
  • It didn't say "Netcat has stopped or crashed."? – konsolebox Aug 14 '13 at 11:13
  • I think it is netcat who's crashing. Because after one day, when the system is not working anymore. I kill both the netcat processes. Without doing anything else, the system starts working again... – Dennis Aug 15 '13 at 06:24
  • You should also check if the subshell connected to netcat (| while read ...) doesn't make itself go to an infinite wait or loop, or deadlock. You can also add debug messages inside it with `echo "[$(date "+%F %T")] ." >&2` to know the last part of the command block or the last position before things crashed. If it's really netcat then perhaps you can install a new or more stable version of it either from a binary package or source. – konsolebox Aug 15 '13 at 06:41
5

About the loop it could look like this.

#!/bin/bash

for (( ;; ))
do
    netcat -lk -p 12345 | while read line
    do
        match=$(echo "$line" | grep -c 'Keep-Alive')
        if [ "$match" -eq 1 ]; then
            [start a command]
        fi
    done
    sleep 4s
done

with added double quotes to keep it safer.

And you could try capturing errors and add some logging with this format:

#!/bin/bash

{
    echo "[$(date "+%F %T")] Starting loop."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -lk -p 12345 | while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
} >> "/var/log/something.log" 2>&1

Your read command could also be better in this format since it would read lines unmodified:

... | while IFS= read -r line

Some could also suggest the use of process substitution but I don't recommend it this time since through the | while ... method the while loop would be able to run on a subshell and keep the outer for loop safe just in case it crashes. Besides there isn't really a variable from the while loop that would be needed outside of it.

I'm actually having the idea now that the issue might actually have been related to the input and how the while read line; do ...; done block handles it and not netcat itself. Your variables not being quoted properly around "" could be one of it, or could probably be the actual reason why your netcat is crashing.

konsolebox
  • 72,135
  • 12
  • 99
  • 105
  • Good stuff!, added some logging, and a extra loop if netcat stops... I'm gonna try this right now, if this works, i'll give you the points! Thanks! – Dennis Aug 12 '13 at 11:15
  • If you're not running it as root and if your user has no write permission on the write directory, perhaps you could just use the home directory: `} >> ~/something.log 2>&1`, or create the file with write permission for the user as root: `touch /var/log/something.log; chown youruser:yourusersgroup /var/log/something.log; chmod 644 /var/log/something.log # or 600 at your preference` – konsolebox Aug 12 '13 at 11:22
  • I'm running as root, so its all fine. I've also added a line of log code when the event is triggerd. Really think we are on the right direction here. We can now pinpoint in what fase it's crashing. I've also found out, that during boot the script gives an error 3 times, and restartes 3 times. Thanks for helping me with the syntax, if it were powershell, I would be no problem for me, but i'm not that good with Linux... – Dennis Aug 12 '13 at 12:24
  • I remember. You could also add more messages with netcat's -v option. It only sends verbose messages to stderr (fd 2) and not to the pipe so it won't affect the process. `netcat -vv -lk -p 12345 | while IFS= read -r line` – konsolebox Aug 12 '13 at 15:05
  • root 1686 0.0 0.2 2740 1184 ? S Aug12 0:00 /bin/bash /etc/init.d/script1.sh start root 1689 0.0 0.1 2268 512 ? S Aug12 0:00 netcat -lk 12345 root 1690 0.0 0.1 2744 784 ? S Aug12 0:00 /bin/bash /etc/init.d/script1.sh start root 1691 0.0 0.2 2740 1184 ? S Aug12 0:00 /bin/bash /etc/init.d/script2.sh start root 1694 0.0 0.1 2268 512 ? S Aug12 0:00 netcat -lk 12346 root 1695 0.0 0.1 2744 784 ? S Aug12 0:00 /bin/bash /etc/init.d/script2.sh start – Dennis Aug 13 '13 at 08:53
  • script2 doesn't have a start function. that why. – Dru Aug 17 '13 at 21:37
3

You mentioned "after around 12 hours, the whole system stops working" - It is likely that the scripts are executing whatever you have in [start a command] and is bloating the memory. Are you sure the [start a command] is not forking out many processes very frequently and releasing memory?

SSaikia_JtheRocker
  • 5,053
  • 1
  • 22
  • 41
  • good point, to rule this out, I'll have to remove the command and echo to a log file. To see if stays working without my starting script. – Dennis Aug 12 '13 at 07:51
  • So you are saying you have removed the [start a command] part and still your script don't respond after 12 hours? – SSaikia_JtheRocker Aug 14 '13 at 08:38
  • Yup, I've also tried it with loggin. Log entry before starting the command, and an otherone when it comes back. – Dennis Aug 14 '13 at 09:21
  • What I was saying is remove the command totally, you are using in [start a command]. That way you will know if using the command is bloating your system. – SSaikia_JtheRocker Aug 14 '13 at 09:25
3

I have often experienced strange behaviour with nc or netcat. You should have a look at ncat it's almost the same tool but it behaves the same on all platforms (nc and netcat behave differently depending on distri, linux, BSD, Mac).

hashier
  • 4,670
  • 1
  • 28
  • 41
2

Periodically netcat will print, not a line, but a block of binary data. The read builtin will likely fail as a result.

I think you're using this program to verify that a remote host is still connected to port 12345 and 12346 and hasn't been rebooted.

My solution for you is to pipe the output of netcat to sed, then pipe that (much reduced) line to the read builtin...

#!/bin/bash

{
    echo "[$(date "+%F %T")] Starting loop."

    for (( ;; ))
    do
        echo "[$(date "+%F %T")] Starting netcat."

        netcat -lk -p 12345 | sed 's/.*Keep-Alive.*/Keep-Alive/g' | \
        \
        while read line
        do
            match=$(echo "$line" | grep -c 'Keep-Alive')
            if [ "$match" -eq 1 ]; then
                [start a command]
            fi
        done

        echo "[$(date "+%F %T")] Netcat has stopped or crashed."

        sleep 4s
    done
} >> "/var/log/something.log" 2>&1

Also, you'll need to review some of the other startup programs in /etc/init.d to make sure they are compatible with whatever version of rc the system uses, though, it would be much easier to call your script2.sh from a copy of some simple file in init.d. As it stands script2 is the startup script but doesn't conform to the init package you use.

That sounds more complicated that I mean... Let me explain better:

/etc/init.d/syslogd        ## a standard init script that calls syslogd
/etc/init.d/start-monitor   ## a copy of a standard init script that calls script2.sh

As an additional note, I think you could bind netcat to the specific IP that you are monitoring, instead of binding it to the all address 0.0.0.0

Dru
  • 1,398
  • 9
  • 6
  • ksh has a `read -r` (raw), and maybe there is a `-b` binary too. Not sure about `bash`. Good luck to all. – shellter Aug 15 '13 at 23:25
1

you may not use the -p option in the case you will wait for an incoming connect request. (see man page of nc) Hostname and Port are the last two arguments of the command line.

May be it connects to the own port and after some hours there is some resource missing??

tue
  • 497
  • 2
  • 8
  • Good point, but I also noticed that when reading the man pages of netcat. I've removed the -p option. But thats not helping. What do you mean with your second remark? – Dennis Aug 07 '13 at 15:29
  • That's only a sneakuing suspicion. I never used netcat in this way. The -p Option is an Option that is used for outgoing connection requests. The vague suspicion is: netcat could try to initiate some request due to that parameter setting. But you say that it's no difference. – tue Aug 09 '13 at 06:24