50

I have a test executor Pod in K8s cluster created through helm, which asks for a dynamically created PersistentVolume where it stores the test results.

Now I would like to get the contents of this volume. It seems quite natural thing to do. I would expect some kubectl download pv <id>. But I can't google up anything.

How can I get the contents of a PersistentVolume?

I am in AWS EKS; so AWS API is also an option. Also I can access ECR so perhaps I could somehow store it as an image and download?

Or, in general, I am looking for a way to transfer a directory, can be even in an archive. But It should be after the container finished and doesn't run anymore.

Ondra Žižka
  • 43,948
  • 41
  • 217
  • 277
  • 1
    I have created a new issue to get this functionality without using a PersistentVolume: https://github.com/kubernetes/kubernetes/issues/111045 Please support me and give this issue a weight. Not less important please help me to find the persons who are able to bring this topic to success. – Hardy Hobeck Sep 29 '22 at 14:14

1 Answers1

73

I can think about two options to fulfill your needs:

  1. Create a pod with the PV attached to it and use kubectl cp to copy the contents wherever you need. You could for example use a PodSpec similar to the following:
apiVersion: v1
kind: Pod
metadata:
  name: dataaccess
spec:
  containers:
  - name: alpine
    image: alpine:latest
    command: ['sleep', 'infinity']
    volumeMounts:
    - name: mypvc
      mountPath: /data
  volumes:
  - name: mypvc
    persistentVolumeClaim:
      claimName: mypvc

Please note that mypvc should be the name of the PersistentVolumeClaim that is bound to the PV you want to copy data from.

Once the pod is running, you can run something like below to copy the data from any machine that has kubectl configured to connect to your cluster:

kubectl cp dataaccess:/data data/
  1. Mount the PV's EBS volume in an EC2 instance and copy the data from there. This case is less simple to explain in detail because it needs a little more context about what you're trying to achieve.
whites11
  • 12,008
  • 3
  • 36
  • 53
  • Could you please elaborate a bit? 1) `kubectl cp` only works within the cluster, right? I need to get it outside the cluster. 2) Can I mount these on the fly at will? This will be automated. Mounting to some EC2 could work but uploading to S3 is easier. I meant something that would download the content directly by `kubectl`. How do people migrate their content when they decide to go out of the cloud? – Ondra Žižka May 17 '18 at 08:45
  • 1
    Nope, `kubectl cp` works from whenever kubectl does, so you can use it to copy data from your cluster to your local workstation if you wish. – whites11 May 17 '18 at 08:55
  • As per the other question, not sure what you can do with AWS EBS, but I'm pretty sure you can somehow copy your stuff to S3. – whites11 May 17 '18 at 08:56
  • 2
    Added an example of my first proposal. – whites11 May 17 '18 at 09:05
  • What I am doing is `helm test`. This creates a container which runs tests against some pods. `helm` is being called from Jenkins in a private datacenter. I need to get the test results (100's of files) back to that Jenkins. I would run the tests (Gatling) from that Jenkins but some more influential people are pushing Helm everywhere. – Ondra Žižka May 17 '18 at 11:24
  • Yeah I think the first solution is just doable. You can script pod creation and kubectl cp inside the Jenkins CI – whites11 May 17 '18 at 11:27
  • Would the first example run for long enough to give you time to copy? In my case it simply finished and was in a crashloopbackoff. This example worked well for me though: https://stackoverflow.com/questions/31870222/how-can-i-keep-a-container-running-on-kubernetes – Satyajit Das Nov 07 '19 at 11:52
  • Of course not, I forgot to specify a command and the default command for alpine is something that returns immediately (hence the crash loop backoff). Updated the example spec. – whites11 Nov 08 '19 at 12:16
  • 1
    Just a note on the command used to keep the pod alive: you can also use `tail -f /dev/null` (in place of `sleep 999999`) if you want the pod to live indefinitely until explicit deletion. – bluu May 26 '20 at 10:17
  • Very useful. I incorporated this technique with a link back in a blog post about using rsync with kubernetes for disaster recovery: https://vhs.codeberg.page/post/recover-files-kubernetes-persistent-volume/ – vhs Mar 06 '22 at 02:23