5

The goal was to frequently change default outgoing source ip on a machine with multiple interfaces and live ips.

I used ip route replace default as per its documentation and let a script run in loop for some interval. It changes source ip fine for a while but then all internet access to the machine is lost. It has to be remotely rebooted from a web interface to have any thing working

Is there any thing that could possibly prevent this from working stably. I have tried this on more than one servers?

Following is a minimum example

# extract all currently active source ips except loopback
IPs="$(ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' | cut -d: -f2 |
awk '{ print $1}')"

read -a ip_arr <<<$IPs

# extract all currently active mac / ethernet addresses
Int="$(ifconfig  | grep 'eth'| grep -v 'lo' | awk '{print $1}')"
read -a eth_arr <<<$Int

ip_len=${#ip_arr[@]}
eth_len=${#eth_arr[@]}

i=0;
e=0;

while(true); do

    #ip route replace 0.0.0.0 dev eth0:1 src 192.168.1.18
    route_cmd="ip route replace 0.0.0.0 dev ${eth_arr[e]} src ${ip_arr[i]}"
    echo $route_cmd
    eval $route_cmd

    sleep 300

    (i++)
    (e++)

    if [ $i -eq $ip_len ]; then
        i=0;
        e=0;
        echo "all ips exhausted - starting from first again"
    #   break;
    fi

done
fkl
  • 5,412
  • 4
  • 28
  • 68
  • `(true)` is an error (harmless but an error nonetheless) it spawns a pointless sub-shell. You are missing an opening `(` on the `i++` line (I assume that's a transcription typo though. Don't mess with `eval` when you don't have to (and you don't have to here). If you want to see the command that is run you can use `set -x`/`set +x` or `set -v`/`set +v` around the `ip route` line to get the shell to print out the command it runs (`-x`) or the line it reads (`-v`) at you. Where in the process do things break? The end of the first pass? Randomly? What is the state of the routes when it happens? – Etan Reisner Jun 12 '15 at 19:47
  • Are all the interfaces on the same LAN (e.g., 192.168.1.0/24?) – RTLinuxSW Jun 12 '15 at 20:37
  • Thanks guys, yeah excuse the typos' the script works fine even for days when run in background. For the first 10 15 iterations (where basically ip is changed around 15 times) every thing works fine. After that randomly, internet reach ability is lost from this machine. Because its a remote server, i cannot ping any of the interfaces either - so i have to reboot it from an http based control panel for it to get back ssh access. Yes they are all on the same LAN. But all are live IPs – fkl Jun 12 '15 at 20:50
  • At the point when it breaks routing table looks perfectly okay i.e. the last set source ip appears as default one. It is internet reach ability which becomes the problem. My hunch was that is there any possible impact of frequently changing source ip from the same subnet on the routing table entries on the intermediate routers – fkl Jun 12 '15 at 20:53
  • Err i thought i fixed the posted code. I took note on the recommendations regarding eval and use of true, but they are certainly not related to the problem i asked for. No i don't need nested loops. Why would one want to use an ip for a different mac - You should only use an ip which is tied to a real mac. The problem still occurs when i put delay of up to 20 minutes per swap though it does happen proportionally late i.e. almost after same - 10-12 times changing the source ip – fkl Jun 13 '15 at 17:23
  • That's kind of strange - and some how gives me some impression why SO might be going a bit out of fashion. A user with a pretty decent ratio of answers to questions (meaning helps very often others and rarely asks questions) posted a script with a question. There was a minor typo or two - but all received back was advice on improving script or mentioning those typos (which i value - but play no role in solving the problem at hand). I wonder what affect this would have on SO as a community for helping ppl vs one used merely for point scoring – fkl Jun 14 '15 at 19:45
  • You say the errors aren't important but they could indicate any number of other things are going on (in code we can't see). Does it always break at the same iteration count? What *specifically* happens just before it breaks? Add `set -x` or `set -v` (or both) to the script, pipe the output to a file, run it until it breaks a few times and then see what was happening **immediately** before it broke. Do you have other machines on the same local network so you could see if all connectivity is broken? (You also failed to mention me and I just came back to this question for the first time now.) – Etan Reisner Jun 21 '15 at 22:53
  • 1
    Can the machine still reach its gateway at the point where it breaks? Can it talk to itself over loopback? Are the interfaces filling up with collisions? Are the network buffers filling up with unsent/unread packets? What else can you get the script to dump at you about the general networking status of the system when it breaks? – Etan Reisner Jun 21 '15 at 22:54
  • I think, what you are really looking for is [link aggregation](http://en.wikipedia.org/wiki/Link_aggregation)... Check if that's what you want... – anishsane Jun 22 '15 at 13:21
  • Why do you need this tricks? If I have to do something similar i would mark odd and even connections in iptables and then route them based on mark. I guess UDP traffic should suffer from such freequent changes. Would it be enough to SNAT packets by iptables via one interface? Maybe you should better ask a question about how to reach your goal without your script. To not loose connectivity try to add a separate routing rule for your own ip. – user3132194 Jun 25 '15 at 10:51

1 Answers1

4

I wanted to comment, but as I'm not having enough points, it won't let me.

Consider:

  • Does varying the delay time before its run again change the number of iterations before it fails?
  • Exporting the ifconfig & routes every time you change it, to see if something is meaningfully different over time. Maybe some basic tests to it (ping, nslookup, etc) Basically, find what is exactly going wrong with it. Also exporting the commands you send to a logfile (text file per change?) to see changes in them to see if some is different after x iterations.
  • What connectivity is lost? Incoming? Outgoing? Specific applications?
  • You say you use/do this on other servers without problems?
  • Are the IP's: Static (/etc/network/interfaces), bootp/DHCP, semi-static (bootp/DHCP server serving, based on MAC address), and if served by bootp/DHCP, what is the lease duration?

On the last remark: bootp/dhcp will give IP's for x duration. say its 60 minutes. After half that time it will "check" with the bootp/dhcp server if it can keep the IP, and extend the lease to 60 minutes again, this can mean a small reconfig on the ifconfig (maybe even at the same time of your script?).

hth

Morph
  • 106
  • 1
  • Thank you. I have tried most of the recommendations. a) No b) I was already logging actually to a file. Routing tables et all are perfectly what they should be after each change. c) Primary connectivity loss is outgoing. But because the machine is remote and i am sshing into it, this results in my ssh connection to freeze as well. – fkl Jun 26 '15 at 14:23
  • d) No i actually tried this on two different servers - each with 4-5 interfaces and difference live ip's and the outcome was identical. e) The ips are DHCP based but the lease duration is much longer. I will reevaluate each of those though if i missed something including your last point. Appreciate that. – fkl Jun 26 '15 at 14:24
  • Hmm, outgoing not working can mean 2 things. 1. internal cannot find its way "out" or 2. External equipment/software is getting confused. – Morph Jun 28 '15 at 06:02
  • @fayyazkl I'd look for independent ways to check connections like a ping to some distant host. If pinging works, but (fill in application, can test with for example wget) not, ping is via the ICMP protocol, not TCP or UDP. Which means its often not been touched by the firewalls or w/e. you losing connection with SSH can be understandable, for basically the script tells the server to use a different IP to push out packets. If the route back doesn't know the connection is going to be shifted with a different source IP, it may be a problem. Also, are all IP's listed in the DNS? – Morph Jun 28 '15 at 06:17
  • @fayyazkl also.... rereading the script, and I noticed that the IP's are only pulled once at start of the script. But if the IP's are DHCP based, and one of the IP's changed..... I'd prolly change the while loop to include a check if the IP is still being leased. If not, exit loop, and rerun script (while loop the whole script, with the current one within it maybe). – Morph Jun 28 '15 at 06:26
  • Thank you. For the sake of keeping focus, ips are still the same. Validating ping to any other node was already done and its not functional i.e. out going access from the machine is gone. Sure i understood the problem with the existing ssh connection, but i was expecting it to be initiated again. None of the things listed is some thing that i haven't tried. In fact i also used iptables (iptables -t nat -A POSTROUTING -j SNAT --to ) to see if that makes any difference and results were pretty much the same. – fkl Jun 28 '15 at 08:58
  • 1
    Well, if configurations are good after changes, it seems to me that there is some strange bug going on, that the return of packets is going asymmetric, which may confuse applications/services on top. I would prolly do a nfdump on the host for a minute before up to a minute after to see actual packet source/destination information and on what interface they get returns etc. And diagnose from there what exactly goes wrong. ( Maybe get a trial version of Riverbed's SteelCentral Packet Analyzer Personal Edition to analyze it, its a beefed up version of Wireshark, Windows app tho) – Morph Jun 29 '15 at 10:40
  • Thanks will try that out – fkl Jun 29 '15 at 14:15
  • For any one who sees this latter the real issue was that when you do routing you don't change packet content, you just choose the right interface to foward it to. So this won't change the source IP address you got. It will go to the right routing table and the right interface but that's all. To do this you need NAT. So you need to masquerade in POSTROUTING http://serverfault.com/a/637152/124005. – fkl Jul 18 '15 at 05:14