1

I am following RKE2's Quick Start guide and this Dell's article to get a pair of RKE2 server and agent to work. I am using v1.24.9+rke2r2. The setup is done with Virtualbox using Ubuntu 20.04. Both VM's are using Adapter 1 for NAT, which always shows up as enp0s3 with an IP of 10.0.2.15. Adapter 2 is used with the Host-only Adapter option. It shows up as enp0s8 with an IP of 192.168.56.101 for the server, and 192.168.56.102 for the agent. vm-01 and vm-02 can ping each other with this setup. I got the server node to work just fine. I modify the file /etc/rancher/rke2/config.yaml as follows before restarting the server service:

tls-san:
  - "192.168.56.101"
  - "192.168.56.102"
server: 
  - "https://192.168.56.101:9345"
token: 
  - "K101795742c954c5c8f5d9aa21588a6e6990f29ccdb3e5412292f01ea4bb41f31ae::server:6bf9ab3e0a1e214d85335657578cac67"

On the agent node (vm-02), I set the /etc/rancher/rke2/config.yaml file as follow:

server: 
  - "https://192.168.56.101:9345"
token: 
  - "K101795742c954c5c8f5d9aa21588a6e6990f29ccdb3e5412292f01ea4bb41f31ae::server:6bf9ab3e0a1e214d85335657578cac67"

I then start the agent service. The first issue I notice is that the node kube-proxy-vm-02 never comes up on the initial start. I must restart the agent service for it to appear.

The second issue is that the extra rke2-coredns-rke2-coredns-XXX and rke2-canal-XXX nodes from the agent that come up never succeed. The coredns node is always stuck in Pending state. The canal node ends up in the Init:CrashLoopBackOff state. I just journalctl -u rke2-agent -f to check for error and this shows up:

Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=info msg="Connecting to proxy" url="wss://10.0.2.15:9345/v1-rke2/connect"
Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 10.0.2.15:9345: connect: connection refused"
Jan 18 11:49:48 vm-02 rke2[2346]: time="2023-01-18T11:49:48+07:00" level=error msg="Remotedialer proxy error" error="dial tcp 10.0.2.15:9345: connect: connection refused"

It seems to me that the agent service keeps calling the server node at 10.0.2.15:9345. However I clearly specify that the server is located at 192.168.56.101:9345. Looks like this is the reason for my problem. Could someone tell me what I should do to get past this, and proceed further? Many thanks!

CaTx
  • 1,421
  • 4
  • 21
  • 42

1 Answers1

0

This is similar to issue 3176 on RKE2's Github. I have managed to get it to work using the following steps:

$ systemctl enable rke2-server
$ systemctl start rke2-server
... WAIT FOR ALL PODS TO BE READY
$ sudo touch /etc/rancher/rke2/config.yaml
$ sudo nano /etc/rancher/rke2/config.yaml
... EDIT CONFIG FILE
$ systemctl stop rke2-server
$ sudo rke2 server --cluster-reset --node-ip 192.168.56.101 --node-external-ip 192.168.56.101  --advertise-address 192.168.56.101
$ sudo reboot

Afterwards, it works as intended.

CaTx
  • 1,421
  • 4
  • 21
  • 42