2

i am playing around with metallb in l2 mode and iptables routing on a ubuntu 22.04 system with 2 interfaces.

I have ens160 (on all nodes master + worker) for all the local traffic and ens192 (only on my worker) where metallb has access to my public ip network. I configured metallb to only use my worker nodes where ens192 is available. I am using Ubuntu 22.04 which uses netplan per default with which i finally tried to setup a few rules for the interface ens192.

The interface ens192 has no ip set up directly. According to metallb and kube-proxy documentation using kube-proxy in ipvs mode with strict arp mode is the way it should work and the ips should be announced using arp. As ingress I am using nginx which successfully gets an ip assigned by metallb. When checking the dummy interface kube-ipvs0 I can see the assigend ip address.

kubectl -n nginx-ingress get svc
NAME                                 TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.249.73.29    1.2.3.17   80:30311/TCP,443:31050/TCP   21d
ingress-nginx-controller-admission   ClusterIP      10.247.82.117   <none>          443/TCP                      21d

In my cluster directly i can access the service but not from outside. It times out.

My routing rules set with netplan are as following:


network:
  version: 2
  renderer: networkd
  ethernets:
    ens160:
      dhcp4: no
      dhcp6: no
      addresses:
        - 172.31.16.20/24
      routes:
      - to: default
        via: 172.31.16.254
      nameservers:
        addresses:
          - 10.2.2.2
          - 10.7.2.2
        search:
          - esrv.local
    ens192:
      dhcp4: no
      dhcp6: no
      routing-policy:
      - from: 1.2.3.16/28
        table: 1019
        priority: 100
      - from:1.2.3.16/28
        to: 192.168.0.0/16
        priority: 99
      routes:
      - to: default        
        via: 1.2.3.30
        table: 1019
      - to: 1.2.3.16/28
        table: 1019
      - to: 1.2.3.16/28

Route information:

ip rule show
0: from all lookup local
99: from 1.2.3.16/28 to 192.168.0.0/16 lookup main proto static
100: from 1.2.3.16/28 lookup 1019 proto static
32766: from all lookup main
32767: from all lookup default
ip route list
default via 172.31.16.254 dev ens160 proto static
172.31.16.0/24 dev ens160 proto kernel scope link src 172.31.16.20
192.168.135.64/26 via 192.168.135.65 dev vxlan.calico onlink
blackhole 192.168.177.192/26 proto 80
192.168.177.232 dev calid7e72cc188e scope link
192.168.177.233 dev cali3542ba50312 scope link
192.168.177.234 dev cali101d1e0fb1d scope link
1.2.3.16/28 dev ens192 proto static scope link
ip route list table 1019
default via 1.2.3.30 dev ens192 proto static onlink
1.2.3.16/28 dev ens192 proto static scope link

When i kick out the 100: from 1.2.3.16/28 lookup 1019 proto static rule i can see that the traffic get routed through ens160. Which would be correct in this case because of the default route.


tcpdump -n -e -q -vvvvv -i any port 80

tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:05:07.961685 ens192 In ifindex 3 70:70:8b:1d:6a:bf (tos 0x0, ttl 58, id 63476, offset 0, flags [DF], proto TCP (6), length 60)
[CLIENT PUB IP].10400 > 1.2.3.17.80: tcp 0
12:05:07.961967 cali3542ba50312 Out ifindex 6 ee:ee:ee:ee:ee:ee (tos 0x0, ttl 57, id 63476, offset 0, flags [DF], proto TCP (6), length 60)
172.31.16.20.14633 > 192.168.177.233.80: tcp 0
12:05:07.962018 cali3542ba50312 In ifindex 6 e6:d1:f8:03:b9:b7 (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
192.168.177.233.80 > 172.31.16.20.14633: tcp 0
12:05:07.962062 ens160 Out ifindex 2 00:50:56:a6:1e:38 (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
1.2.3.17.80 > [CLIENT PUB IP].10400: tcp 0
12:05:07.962290 ens160 In ifindex 2 54:75:d0:5b:10:fc (tos 0x0, ttl 255, id 43249, offset 0, flags [none], proto TCP (6), length 40)
[CLIENT PUB IP].10400 > 1.2.3.17.80: tcp 0
12:05:07.962344 cali3542ba50312 Out ifindex 6 ee:ee:ee:ee:ee:ee (tos 0x0, ttl 254, id 43249, offset 0, flags [none], proto TCP (6), length 40)
172.31.16.20.14633 > 192.168.177.233.80: tcp 0
^C
6 packets captured
8 packets received by filter
0 packets dropped by kernel

But when adding the 100: from 1.2.3.16/28 lookup 1019 proto static rule it seems to use the routing table but i can't see the traffic routed out.

tcpdump -n -e -q -vvvvv -i any port 80

tcpdump: data link type LINUX_SLL2
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
12:08:26.444843 ens192 In ifindex 3 70:70:8b:1d:6a:bf (tos 0x0, ttl 58, id 20253, offset 0, flags [DF], proto TCP (6), length 60)
[CLIENT PUB IP].10400 > 1.2.3.17.80: tcp 0
12:08:26.444975 cali3542ba50312 Out ifindex 6 ee:ee:ee:ee:ee:ee (tos 0x0, ttl 57, id 20253, offset 0, flags [DF], proto TCP (6), length 60)
172.31.16.20.38026 > 192.168.177.233.80: tcp 0
12:08:26.445009 cali3542ba50312 In ifindex 6 e6:d1:f8:03:b9:b7 (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
192.168.177.233.80 > 172.31.16.20.38026: tcp 0
12:08:27.467228 cali3542ba50312 In ifindex 6 e6:d1:f8:03:b9:b7 (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
192.168.177.233.80 > 172.31.16.20.38026: tcp 0
12:08:27.492653 ens192 In ifindex 3 70:70:8b:1d:6a:bf (tos 0x0, ttl 58, id 20254, offset 0, flags [DF], proto TCP (6), length 60)
[CLIENT PUB IP].10400 >1.2.3.17.80: tcp 0
12:08:27.492742 cali3542ba50312 Out ifindex 6 ee:ee:ee:ee:ee:ee (tos 0x0, ttl 57, id 20254, offset 0, flags [DF], proto TCP (6), length 60)
172.31.16.20.38026 > 192.168.177.233.80: tcp 0
12:08:27.492773 cali3542ba50312 In ifindex 6 e6:d1:f8:03:b9:b7 (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
192.168.177.233.80 > 172.31.16.20.38026: tcp 0
^C
7 packets captured
9 packets received by filter
0 packets dropped by kernel

IP Info:

2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:1e:38 brd ff:ff:ff:ff:ff:ff
altname enp3s0
inet 172.31.16.20/24 brd 172.31.16.255 scope global ens160
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea6:1e38/64 scope link
valid_lft forever preferred_lft forever
3: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:a6:8d:f5 brd ff:ff:ff:ff:ff:ff
altname enp11s0
inet6 fe80::250:56ff:fea6:8df5/64 scope link
valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
inet 1.2.3.17/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
Ubuntu 22.04 with kernel 5.15.0-76-generic
Kubernetes: 1.26.5
Calico cluster: v3.25.0
Metallb: 0.13.10
Kube Proxy in ipvs mode with strict arp

Calico config:

helm install calico projectcalico/tigera-operator --version v3.25.0 -f calico-config.yaml --namespace tigera-operator

---
installation:
  cni:
    type: Calico
  calicoNetwork:
    bgp: Disabled
    ipPools:
    - cidr: 192.168.0.0/16
      encapsulation: VXLAN

Metallb was installed using helm with default parameters. Metallb config:

 cat metallb-namespace.yml 
apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/audit: privileged
    pod-security.kubernetes.io/warn: privileged
  name: metallb
cat metallb-crds.yml 
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: isp-vlan1086-ipp
spec:
  addresses:
  - 1.2.3.17 - 1.2.3.27
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: isp-vlan1086-adv
spec:
  ipAddressPools:
  - isp-vlan1086-ipp
  nodeSelectors:
  - matchLabels:
      kubernetes.io/hostname: itsrv4635.esrv.local
  interfaces:
  - ens192

I tried to follow this article with no luck: https://itnext.io/configuring-routing-for-metallb-in-l2-mode-7ea26e19219e

I hope anybody has a clue whats going on here. I am playing around with this issue since weeks and don't know what i am missing.

In the last few weeks i worked through more than 20 different threads an github issues with no luck. The most importent thries i guess: https://github.com/projectcalico/calico/issues/6789 https://github.com/metallb/metallb/issues/610

And additionally through an article which describes how the routing should be set up: https://itnext.io/configuring-routing-for-metallb-in-l2-mode-7ea26e19219e

I begun with RHEL 9 which had problems with rook ceph. Changed to RHEL 8 on which i had no luck with routing and ended up with Ubuntu 22.04 where i also have no luck currently.

EDIT: I changed from calico to flannel applied source based routing and I am now able to see that the traffic is stucking after cni0:

09:12:50.867851 ens192 In  ifindex 3 70:70:8b:1d:6a:bf (tos 0x0, ttl 59, id 54409, offset 0, flags [DF], proto TCP (6), length 60)
    [Client PUB IP].54660 > 1.2.3.17.80: tcp 0
09:12:50.868209 cni0  Out ifindex 6 6a:2a:26:51:8c:94 (tos 0x0, ttl 58, id 54409, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.1.60619 > 192.168.2.10.80: tcp 0
09:12:50.868218 vethc393243b Out ifindex 9 6a:2a:26:51:8c:94 (tos 0x0, ttl 58, id 54409, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.1.60619 > 192.168.2.10.80: tcp 0
09:12:50.868258 vethc393243b P   ifindex 9 ea:59:75:f8:df:bc (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.10.80 > 192.168.2.1.60619: tcp 0
09:12:50.868258 cni0  In  ifindex 6 ea:59:75:f8:df:bc (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.2.10.80 > [Client PUB IP].54660: tcp 0

It now seems to just not able to leave via ens192.

OLED01
  • 21
  • 2

1 Answers1

0

Copy from my comment.

For those who still struggle in this problem, I may have a simple hacky solution that avoid dealing with iptables.
My setting is (metallb+calico+kube-proxy+ipvs-strict-arp) ubuntu22.04 and L2advertisement and ens160 for 10.0.0.0/24 internal management network, ens192 for public network at 172.30.2.0/24.

Update 2023 Aug 23:

I believe my problem is related to https://github.com/projectcalico/calico/issues/2834.

with comment i managed to route package correctly as expected.

if it doesn't work for you, try the following:


History:

I think the reason for this headache may be the policy base routing does not get applied. (I doubt that it's because we set kube-proxy from iptables mod to ipvs mode.)

Analysis

First I use this setup

@k8s-worker-3:~$ ip route
default via 10.1.1.1 dev ens160 proto dhcp src 10.1.1.134 metric 100 
10.1.1.0/24 dev ens160 proto kernel scope link src 10.1.1.134 metric 100 
10.1.1.1 dev ens160 proto dhcp scope link src 10.1.1.134 metric 100  
blackhole 192.168.69.192/26 proto bird 
192.168.140.0/26 via 10.1.1.133 dev tunl0 proto bird onlink 
192.168.182.64/26 via 10.1.1.81 dev tunl0 proto bird onlink 
192.168.196.0/26 via 10.1.1.80 dev tunl0 proto bird onlink 
192.168.230.0/26 via 10.1.1.121 dev tunl0 proto bird onlink 
@k8s-worker-3:~$ ip rule
0:  from all lookup local
32764:  from 172.30.2.0/24 to 192.168.0.0/16 lookup main proto static
32765:  from 172.30.2.0/24 lookup 188 proto static
32766:  from all lookup main
32767:  from all lookup default
@k8s-worker-3:~$ ip route show table 188
default via 172.30.2.1 dev ens192 proto static onlink 

And both incoming traffic from public net and internal net can reach the node, but no outgoing traffic send.

~$ sudo tcpdump -i any 'port 80'

ens192 In  IP [public net].61.32.25886 > 172.30.2.133.http: Flags [S], seq 1648037408, win 42340, options [mss 1460,nop,nop,sackOK,nop,wscale 11], length 0
ens192 In  IP [public net].149.208.57414 > 172.30.2.133.http: Flags [S], seq 4126846053, win 1024, length 0
ens192 In  IP 172.30.2.130.59977 > 172.30.2.133.http: Flags [S], seq 3224408039, win 65535, options [mss 1400,nop,wscale 6,nop,nop,TS val 1944409597 ecr 0,sackOK,eol], length 0

after i add route

172.30.2.0/24 dev ens192 proto static scope link 

I managed to get access from the same network.

ens192 In  IP 172.30.2.130.60071 > 172.30.2.133.http: Flags [S], seq 523314498, win 65535, options [mss 1400,nop,wscale 6,nop,nop,TS val 1684829178 ecr 0,sackOK,eol], length 0
tunl0 Out IP k8s-worker-3.52990 > 192.168.230.11.http: Flags [S], seq 523314498, win 65535, options [mss 1400,nop,wscale 6,nop,nop,TS val 1684829178 ecr 0,sackOK,eol], length 0
tunl0 In  IP 192.168.230.11.http > k8s-worker-3.52990: Flags [S.], seq 1229294084, ack 523314499, win 64260, options [mss 1440,sackOK,TS val 778290393 ecr 1684829178,nop,wscale 7], length 0
ens192 Out IP 172.30.2.133.http > 172.30.2.130.60071: Flags [S.], seq 1229294084, ack 523314499, win 64260, options [mss 1440,sackOK,TS val 778290393 ecr 1684829178,nop,wscale 7], length 0

however, there is still no outgoing traffic from public net. But after i add route.

default via 172.30.2.1 dev ens192 proto static 

I manage to get access and outgoing traffic from both public and internal network. So my conclusion is that k8s only take routes from main table, and the policy base routing does not get applied.

Solution

Simply add default route

default via 172.30.2.1 dev ens192 proto static 

defect: the node don't actually get an IP in ens192. So the machine itself and ohter pods in this node can't access internet.

( ens192 Out IP k8s-worker-3.http > [internet].32.46970
it's using incorrect ip for ens192(from lo or other interface i supposed), so no way for the reply to get back. )

A little better hacky solution (enough for most use case)

Statically assign an IP to ens192, and add default route.

I assign the exact same IP for the node to do L2Advertisement. And it's working fine. But you will need to specify the node for speaker.

network:
  ethernets:
    ens160:
      dhcp4: true
    ens192:
      addresses:
      - 172.30.2.133/24
      routes:
      - to: default
        via: 172.30.2.1

  version: 2
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: hack-172.30.2.133
  namespace: metallb
spec:
  ipAddressPools:
  - hack-172.30.2.133
  nodeSelectors:
  - matchLabels:
      has-ip: "172.30.2.133"
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: hack-172.30.2.133
  namespace: metallb
spec:
  addresses:
  - 172.30.2.133/32 
---

Also you can assign other ip that is not in the pool. So that you don't have to do nodeSelectors and get a larger pool for speaker to choose from. But it will take one ip for access ineternet.

Or just make sure the assigned ip does not get advertised by other node.

But still, the approch so far must use interface ens192 for access internet. In my senario, the ens192 might have some limited access to the internet. (yeah, chinese firewall) I want it to access internet using ens160 while being accessable in ens192.

More hacky way

I set ip forward for router of ens160(10.1.1.1 in my case). Which make a traffic like this.
incoming ens192 ->k8s node-> outgoing ens160

ens192 In  IP [public client].32.54724 > k8s-worker-3.http: Flags [.], ack 716, win 21, length 0
tunl0 Out IP k8s-worker-3.63364 > 192.168.230.11.http: Flags [.], ack 716, win 21, length 0
tunl0 In  IP 192.168.230.11.http > k8s-worker-3.63364: Flags [.], ack 849, win 502, length 0
ens160 Out IP k8s-worker-3.http > [public client].32.54724: Flags [.], ack 849, win 502, length 0
network:
  ethernets:
    ens160:
      dhcp4: true
    ens192:
      addresses:
      - 172.30.2.4/24 # set an ip not in advertised pool
  version: 2
default via 10.1.1.1 dev ens160 proto dhcp src 10.1.1.134 metric 100 
10.1.1.0/24 dev ens160 proto kernel scope link src 10.1.1.134 metric 100 
10.1.1.1 dev ens160 proto dhcp scope link src 10.1.1.134 metric 100  
172.30.2.0/24 dev ens192 proto kernel scope link src 172.30.2.4 metric 100 
blackhole 192.168.69.192/26 proto bird 
192.168.140.0/26 via 10.1.1.133 dev tunl0 proto bird onlink 
192.168.182.64/26 via 10.1.1.81 dev tunl0 proto bird onlink 
192.168.196.0/26 via 10.1.1.80 dev tunl0 proto bird onlink 
192.168.230.0/26 via 10.1.1.121 dev tunl0 proto bird onlink 

I'm hoping someone can come up with a better solution without modifying router setting or using complicated packet filter.

Using Iptables is just too complicated for me, and i can't configure it right using the method others provided. Sorry! :)

Azmya
  • 1
  • 1
  • 1
    @Yunnosch Sorry for that, since i don't get enough reputation for comment, and i'm eager for an discussion for solution. Now that i have rephrase it as an answer. – Azmya Aug 18 '23 at 07:58
  • Absolutely and with your changes one with much higher chances for upvotes. I am tempted to upvote just for your cooperation. But that would be wrong. And I sadly I am actually technically knowledge-free. So please understand I don't. – Yunnosch Aug 18 '23 at 08:00