27

I'm running a Google Kubernetes Engine with the "private-cluster" option. I've also defined "authorized Master Network" to be able to remotely access the environment - this works just fine. Now I want to setup some kind of CI/CD pipeline using Google Cloud Build - after successfully building a new docker image, this new image should be automatically deployed to GKE. When I first fired off the new pipeline, the deployment to GKE failed - the error message was something like: "Unable to connect to the server: dial tcp xxx.xxx.xxx.xxx:443: i/o timeout". As I had the "authorized master networks" option under suspicion for being the root cause for the connection timeout, I've added 0.0.0.0/0 to the allowed networks and started the Cloud Build job again - this time everything went well and after the docker image was created it was deployed to GKE. Good.

The only problem that remains is that I don't really want to allow the whole Internet being able to access my Kubernetes master - that's a bad idea, isn't it?

Are there more elegant solutions to narrow down access by using allowed master networks and also being able to deploy via cloud build?

John Topley
  • 113,588
  • 46
  • 195
  • 237
Mizaru
  • 303
  • 3
  • 7
  • The problem here is that the Cloud Build API needs to communicate with your cluster which is why it worked when you changed the authorized network to 0.0.0.0. You'd have to add a range of IPs that the Google APIs use to your master authorized network which does not seem like a good idea. Instead, could the build trigger something to your local machine which, in turn, triggers a call from your local machine to the K8s master to update the image? – Patrick W Aug 21 '18 at 17:24
  • I'm really trying not to involve my local machine into the build process as I dont like me or my local machine being a vital part of it. I think the problems root cause is obvious - I thought there might maybe a "google internal" way to establish a connection to my cluster. I've also tried to narrow down the list of ip ranges used for cloud builder, but trying to do so still left me with a quite long list ... – Mizaru Aug 22 '18 at 07:05
  • You can definitely do an internal way (you can configure a GCE VM on the same VPC network to use the k8s internal end point instead of the external one) but cloud builder won't have an IP that can be considered internal without opening your cluster up to far more IPs than you'd like – Patrick W Aug 22 '18 at 16:15

9 Answers9

10

It's currently not possible to add Cloud Build machines to a VPC. Similarly, Cloud Build does not announce IP ranges of the build machines. So you can't do this today without creating a "ssh bastion instance" or a "proxy instance" on GCE within that VPC.

I suspect this would change soon. GCB existed before GKE private clusters and private clusters are still a beta feature.

ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
  • 1
    ok - thx for the explanation. I will then figure out how to setup a ssh/proxy into my VPC. Hopefully Cloud Build will learn to handle private GKE clusters soon. – Mizaru Aug 23 '18 at 09:10
  • 1
    just to let you know - I've implemented something one might call a workaround in the meantime: Basically I'm modifying the allowed master networks right before and after the kubernetes deployment step. This way I still have to allow access from 0.0.0.0/0 but - only for a few seconds. As this "only" concerns our development environment I can live with this security trade off here. – Mizaru Aug 31 '18 at 15:04
  • 3
    Instead of 0.0.0.0/0, you could add one more deployment step before the k8s deployment which basically looks up the public ip of the running pipeline... like using the 'curl' container to 'curl icanhazip.com', then use this instead of 0.0.0.0.. – cosmicnag Sep 03 '18 at 13:20
  • Very good idea! I'm gonna check that out. Also didn't know "icanhazip.com" before - very useful service that for sure can be handy in many other occasions. – Mizaru Sep 04 '18 at 07:39
  • 1
    Is there are a proper solution (provided by GCP team) for this yet? – bluelabel Dec 12 '18 at 22:57
  • 2
    I found this in gcp: https://cloud.google.com/solutions/creating-kubernetes-engine-private-clusters-with-net-proxies – bluelabel Dec 13 '18 at 01:07
  • Beside using `curl icanhazip.com` to find out your effective public IP, I've been using `curl checkip.amazonaws.com` (or `curl https://checkip.amazonaws.com` if your company has different egress IPs for HTTP vs. HTTPS). The `checkip.amazonaws.com` address is used by various AWS scripts, so it is a stable, long term hostname. (This is not to promote the competitor -- I'm just using whatever works.) Also, the new solution suggested by @dinvlad last month (Jan 16, 2021) looks like a great solution. – Vincent Yin Feb 21 '21 at 06:01
5

We ended up doing the following:

1) Remove the deployment step from cloudbuild.yaml

2) Install Keel inside the private cluster and give it pub/sub editor privileges in the cloud builder / registry project

Keel will monitor changes in images and deploy them automatically based on your settings.

This has worked out great as now we get pushed sha hashed image updates, without adding vms or doing any kind of bastion/ssh host.

Farhan Husain
  • 51
  • 1
  • 2
  • This is an excellent way to perform Kubernetes cluster deployments, I am using that for a couple of clients too. In addition to solving the reachability problem of a private control plane, it is a nice separation between Kubernetes and builds. With keel (keel.io) you not only get automatic deploys, you can also manually approve them in a web ui. – Overbryd May 03 '21 at 17:02
  • This is a whole new topic, isn't it -- stepping back and redesign the CI/CD with a different paradigm. If so, `FluxCD` and `ArgoCD` can do the same -- in fact, they can reach back to your Github and update the image tag in your Github repo so that your Github is kept in sync with your newly uploaded Docker image. – Vincent Yin Sep 17 '21 at 16:18
4

Updated answer (02/22/2021)

Unfortunately, while the below method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via kubectl, then the tunnel times out after a while. I had to use another trick, which is to dynamically whitelist Cloud Build IP address via Terraform, and then to apply directly, which works every time.

Original answer

It is also possible to create an IAP tunnel inside a Cloud Build step:

- id: kubectl-proxy
  name: gcr.io/cloud-builders/docker
  entrypoint: sh
  args:
  - -c
  - docker run -d --net cloudbuild --name kubectl-proxy
      gcr.io/cloud-builders/gcloud compute start-iap-tunnel
      bastion-instance 8080 --local-host-port 0.0.0.0:8080 --zone us-east1-b &&
    sleep 5

This step starts a background Docker container named kubectl-proxy in cloudbuild network, which is used by all of the other Cloud Build steps. The Docker container establishes an IAP tunnel using Cloud Build Service Account identity. The tunnel connects to a GCE instance with a SOCKS or an HTTPS proxy pre-installed on it (an exercise left to the reader).

Inside subsequent steps, you can then access the cluster simply as

- id: setup-k8s
  name: gcr.io/cloud-builders/kubectl
  entrypoint: sh
  args:
  - -c
  - HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl apply -f config.yml

The main advantages of this approach compared to the others suggested above:

  • No need to have a "bastion" host with a public IP - kubectl-proxy host can be entirely private, thus maintaining the privacy of the cluster
  • Tunnel connection relies on default Google credentials available to Cloud Build, and as such there's no need to store/pass any long-term credentials like an SSH key
dan
  • 1,144
  • 12
  • 17
  • This looks great! It took me a while to understand the network flow. A diagram illustrating every hop and the corresponding communication protocol (e.g., "https") would have helped a lot. – Vincent Yin Feb 21 '21 at 06:32
  • 2
    Unfortunately, while this method works, IAP tunnels suffer from rate-limiting, it seems. If there are a lot of resources deployed via `kubectl`, then the tunnel times out after a while. I had to use another trick which is to whitelist Cloud Build IP via Terraform and then apply directly, which works every time. I updated the answer. – dan Feb 22 '21 at 07:54
  • 1
    So you whitelist Cloud Build's (public) IP in the `master authorized networks` CIDR, right? That only works when the K8s master node has an *external* IP, right? What if the master node has an *internal* IP? Your IAP tunneling method (rate limiting notwithstanding) provides the TCP-level connectivity in the latter scenario. And that's the key differentiator to all other solutions I've seen. – Vincent Yin Feb 22 '21 at 08:11
  • 1
    That is true, although I figured out a nice balance for our clusters: make cluster nodes private while having master node remain public. This way, master authorized networks only allow access from select few IPs, and the master can't be accessed even from non-whitelisted GCP nodes outside the cluster. – dan Feb 23 '21 at 16:47
  • do you think this could be the same answer for this question? https://stackoverflow.com/questions/67251653/how-to-send-requests-between-servers-in-private-network-in-gcp?noredirect=1#comment118872380_67251653 – Arrajj Apr 25 '21 at 12:33
  • Yes, that would address it too, I believe! Looking forward to it as well. – dan Apr 26 '21 at 15:31
  • I am still working on it, i installed the proxy on the instance and the detached container is working fine... However I have a question, if I have a request to be done inside a python job through that SOCKS5 connection, the URL in the code should be : "HTTP://localhost:8080" or another format? – Arrajj May 04 '21 at 09:46
  • @Khaledarja I think with socks5, normally you'd use `socks5h://localhost:8080` or `socks5://localhost:8080` (the difference is in how DNS is resolved - you probably want the latter) – dan May 06 '21 at 19:36
  • Hi dinvlad, I tried following the above steps and installed a danted socks proxy server in my bastion host, but when i try to run this from cloud build, I am getting the below error - The connection to the server localhost:8080 was refused - did you specify the right host or port? – Rajathithan Rajasekar May 28 '21 at 21:36
  • @RajathithanRajasekar it depends on which port your danted proxy is using - the default is 1080 I think. So you might want to try that - I only gave an example because everyone's setup is different. – dan May 29 '21 at 04:11
  • I had modified the port from 1080 to 8080, I am able to establish IAP-Tunnel connection to the bastion host from my cloud shell and also able to execute the kubectl command using the SOCKS connection. But when in cloud build , when i try to run the kubectl command after establishing the connection to the kubectl-bastion from the kubectl-proxy container, I am unable to run the kubectl commands. – Rajathithan Rajasekar May 29 '21 at 04:29
  • - name: gcr.io/cloud-builders/docker env: - CLOUDSDK_COMPUTE_ZONE=us-central1 - CLOUDSDK_CORE_PROJECT=PROJECTNAME args: - '-c' - >- docker run -d --net cloudbuild --name kubectl-proxy gcr.io/cloud-builders/gcloud compute start-iap-tunnel bastion 8080 --local-host-port 0.0.0.0:8080 --zone us-central1-a --project PROJECTNAME && sleep 5 id: bastion entrypoint: sh – Rajathithan Rajasekar May 29 '21 at 04:31
  • @dinvlad - - name: gcr.io/cloud-builders/kubectl env: - CLOUDSDK_COMPUTE_ZONE=us-central1 - CLOUDSDK_CORE_PROJECT=PROJECTNAME args: - '-c' - 'HTTPS_PROXY=socks5://kubectl-proxy:8080 kubectl get pods -n operator' id: Deploy-to-k8s entrypoint: sh – Rajathithan Rajasekar May 29 '21 at 04:31
  • Not sure tbh, if you could create a Github gist with your full config, then I might be able to help. It’s a bit hard to troubleshoot these things in the comments ;-) – dan May 29 '21 at 17:51
  • 1
    @dinvlad - Thank you very much for helping me out. I have added the port of the cloud build steps where I established the connection to kubectl bastion , but in the next step when i try to execute the kubectl command it failed, https://gist.github.com/rajathithan/4faf04f94c40772a997e7c774a6fdb9e – Rajathithan Rajasekar May 30 '21 at 00:19
  • When it comes to step "setup-k8s", i have this output The connection to the server localhost:8080 was refused - did you specify the right host or port? Can someone help me please? :( I have a bastion with tinyproxy installed and configured on port 8080, previsouly 8888. I can estabilish the connection from the cmd without using docker. Ex: "HTTPS_PROXY=localhost:8080 kubectl get pods" – Abdenour Keddar Sep 23 '21 at 10:06
4

Our workaround was to add steps in the CI/CD -- to whitelist the cloudbuild's IP, via Authorized Master Network.

Note: Additional permission for the Cloud Build service account is needed

Kubernetes Engine Cluster Admin

On cloudbuild.yaml, add the whitelist step before the deployment/s.

This step fetches the Cloud Build's IP then updates the container clusters settings;

# Authorize Cloud Build to Access the Private Cluster (Enable Control Plane Authorized Networks)
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  id: 'Authorize Cloud Build'
  entrypoint: 'bash'
  args:
    - -c
    - |
      apt-get install dnsutils -y &&
      cloudbuild_external_ip=$(dig @resolver4.opendns.com myip.opendns.com +short) &&
      gcloud container clusters update my-private-cluster --zone=$_ZONE --enable-master-authorized-networks --master-authorized-networks $cloudbuild_external_ip/32 &&
      echo $cloudbuild_external_ip

Since the cloud build has been whitelisted, deployments will proceed without the i/o timeout error.

This removes the complexity of setting up VPN / private worker pools.

Disable the Control Plane Authorized Networks after the deployment.

# Disable Control Plane Authorized Networks after Deployment
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  id: 'Disable Authorized Networks'
  entrypoint: 'gcloud'
  args:
    - 'container'
    - 'clusters'
    - 'update'
    - 'my-private-cluster'
    - '--zone=$_ZONE'
    - '--no-enable-master-authorized-networks'

This approach works well even in cross-project / cross-environment deployments.

cypher15
  • 104
  • 7
2

I got cloudbuild working with my private GKE cluster following this google document: https://cloud.google.com/architecture/accessing-private-gke-clusters-with-cloud-build-private-pools

This allows me to use cloudbuild and terraform to manage a GKE cluster with authorized network access to control plane enabled. I considered trying to maintain a ridiculous whitelist but that would ultimately defeat the purpose of using authorized network access control to begin with.

I would note that cloudbuild private pools are generally slower than non-private pools. This is due to the server-less nature of private pools. I have not experienced rate limiting so far as others have mentioned.

siesta
  • 1,365
  • 2
  • 16
  • 21
1

Update: I suppose this won't work with production strength for the same reason as @dinvlad's update above, i.e., rate limiting in IAP. I'll leave my original post here because it does solve the network connectivity problem, and illustrates the underlying networking mechanism.

Furthermore, even if we don't use it for Cloud Build, my method provides a way to tunnel from my laptop to a K8s private master node. Therefore, I can edit K8s yaml files on my laptop (e.g., using VS Code), and immediately execute kubectl from my laptop, rather than having to ship the code to a bastion host and execute kubectl inside the bastion host. I find this a big booster to development time productivity.

Original answer

================

I think I might have an improvement to the great solution provided by @dinvlad above.

I think the solution can be simplified without installing an HTTP Proxy Server. Still need a bastion host.

I offer the following Proof of Concept (without HTTP Proxy Server). This PoC illustrates the underlying networking mechanism without involving the distraction of Google Cloud Build (GCB). (When I have time in the future, I'll test out the full implementation on Google Cloud Build.)

Suppose:

  1. I have a GKE cluster whose master node is private, e.g., having an IP address 10.x.x.x.
  2. I have a bastion Compute Instance named my-bastion. It has only private IP but not external IP. The private IP is within the master authorized networks CIDR of the GKE cluster. Therefore, from within my-bastion, kubectl works against the private GKE master node. Because my-bastion doesn't have an external IP, my home laptop connects to it through IAP.
  3. My laptop at home, with my home internet public IP address, doesn't readily have connectivity to the private GKE master node above.

The goal is for me to execute kubectl on my laptop against that private GKE cluster. From network architecture perspective, my home laptop's position is like the Google Cloud Build server.

Theory: Knowing that gcloud compute ssh (and the associated IAP) is a wrapper for SSH, the SSH Dynamic Port Forwarding should achieve that goal for us.

Practice:

## On laptop:
LAPTOP~$ kubectl get ns
^C            <<<=== Without setting anything up, this hangs (no connectivity to GKE).

## Set up SSH Dynamic Port Forwarding (SOCKS proxy) from laptop's port 8443 to my-bastion.
LAPTOP~$ gcloud compute ssh my-bastion --ssh-flag="-ND 8443" --tunnel-through-iap

In another terminal of my laptop:

## Without using the SOCKS proxy, this returns my laptop's home public IP:
LAPTOP~$ curl https://checkip.amazonaws.com
199.xxx.xxx.xxx

## Using the proxy, the same curl command above now returns a different IP address, 
## i.e., the IP of my-bastion. 
## Note: Although my-bastion doesn't have an external IP, I have a GCP Cloud NAT 
## for its subnet (for purpose unrelated to GKE or tunneling).
## Anyway, this NAT is handy as a demonstration for our curl command here.
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 curl -v --insecure https://checkip.amazonaws.com
* Uses proxy env variable HTTPS_PROXY == 'socks5://127.0.0.1:8443'  <<<=== Confirming it's using the proxy
...
* SOCKS5 communication to checkip.amazonaws.com:443
...
* TLSv1.2 (IN), TLS handshake, Finished (20):             <<<==== successful SSL handshake
...
> GET / HTTP/1.1
> Host: checkip.amazonaws.com
> User-Agent: curl/7.68.0
> Accept: */*
...
< Connection: keep-alive
<
34.xxx.xxx.xxx            <<<=== Returns the GCP Cloud NAT'ed IP address for my-bastion 

Finally, the moment of truth for kubectl:

## On laptop:
LAPTOP~$ HTTPS_PROXY=socks5://127.0.0.1:8443 kubectl --insecure-skip-tls-verify=true get ns
NAME              STATUS   AGE
default           Active   3d10h
kube-system       Active   3d10h
Vincent Yin
  • 1,196
  • 5
  • 13
  • I think the problem here is, How to launch that first terminal in a detached mode in cloud build steps? – Arrajj May 04 '21 at 10:02
  • To answer your question^^^, @dinvlad's first code snippet of starting a *background* Docker container does the trick. His answer has a few good nuggets that were not elaborated on -- it is quite a trick in both Docker and networking. – Vincent Yin May 05 '21 at 13:37
  • Yep - apologies if it was too concise. Thanks for elaborating! – dan May 06 '21 at 19:38
1

It is now possible to create a pool of VM's that are connected to you private VPC and can be access from Cloud Build.

Quickstart

p13rr0m
  • 1,107
  • 9
  • 21
  • This doesnt solve the original problem tho – Alexander Meise Nov 14 '22 at 17:17
  • If you can access your VPC, you don't need authorized master networks in the first place. See https://cloud.google.com/architecture/accessing-private-gke-clusters-with-cloud-build-private-pools for a complete answer. – p13rr0m Nov 15 '22 at 14:35
1

My solution might not be the prettiest but it's kinda straight forward. I'm temporarily white-listing the CloudBuild's public IP to run kubectl to update the deployments.

This is how my cloudbuild.yaml looks like. First we run a container to whitelist the public IP:

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: bash
  args:
  - '-c'
  - |
    apt update \
    && apt install -y jq \
    && cd ~ \
    && gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
    && (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | .update.desiredMasterAuthorizedNetworksConfig.cidrBlocks += [{\"cidrBlock\":\"`curl -s ifconfig.me`/32\",\"displayName\":\"CloudBuild tmp\"}]" ./manc.json) > ~/manc2.json \
    && curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"

We can now run whatever kubectl command youd like to run.

This container is going to remove the IP from authoizedNetworks again:

- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
  entrypoint: bash
  args:
  - '-c'
  - |
    apt update \
    && apt install -y jq \
    && cd ~ \
    && gcloud container clusters describe CLUSTERNAME --zone=CLUSTERZONE --project GCPPROJECT --format="json(masterAuthorizedNetworksConfig.cidrBlocks)" > ~/manc.json \
    && (jq ".update.desiredMasterAuthorizedNetworksConfig = .masterAuthorizedNetworksConfig | del(.masterAuthorizedNetworksConfig) | .update.desiredMasterAuthorizedNetworksConfig.enabled = \"true\" | .name = \"projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME\" | del(.update.desiredMasterAuthorizedNetworksConfig.cidrBlocks[] | select(.displayName==\"CloudBuild tmp\"))" ./manc.json) > ~/manc2.json \
    && curl -X PUT -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type:application/json" -H "Accept: application/json" --data "$(cat manc2.json)" "https://container.googleapis.com/v1beta1/{projects/GCPPROJECT/locations/CLUSTERZONE/clusters/CLUSTERNAME}"

Please fill out CLUSTERNAME, GCPPROJECT, CLUSTERZONE. Feel free to improve =)

Tim
  • 21
  • 3
0

Previously, the official GCP guidance was to setup an HA VPN to facilitate a connection between GKE and a custom Build Pool. In addition to being tedious, complex, and costly (requiring you to reserve 4 static IP addresses!), this method has a serious downside, which was a deal-breaker for me. You must disable the Public IP address for the control plane for any of this setup to accomplish anything, which means you need something like a bastion instance to connect to the control plane afterwards.

There has been an open issue for the past few years which very recently got an update including a tutorial for a much more satisfactory solution: setting up a NAT VM instance for a Custom Build Pool and adding it as an Authorized Network to your GKE cluster.

Having just today followed the referenced tutorial, I can say this method works will relatively little pain.

DragonBobZ
  • 2,194
  • 18
  • 31