Questions tagged [kubeflow]

Kubeflow Training Operator provides Kubernetes custom resources that makes it easy to run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.

GitHub: https://github.com/kubeflow/training-operator

433 questions
23
votes
9 answers

Sudden ImportError: cannot import name 'appengine' from 'requests.packages.urllib3.contrib error on pipeline

My pipelines and schedulers were running smoothly without any problems. After I went out to lunch, I changed the number of epochs a Neural Network would run, save the .yaml file again and leave it in the bucket named "budgetff". Afterwards,…
18
votes
7 answers

Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled:…
Radhi
  • 6,289
  • 15
  • 47
  • 68
11
votes
1 answer

Kubeflow vs other options

I am trying to find when it makes sense to create your own Kubeflow MLOps platform: If you are Tensorflow only shop, do you still need Kubeflow? Why not TFX only? Orchestration can be done with Airflow. Why use Kubeflow if all you are using…
Cengiz
  • 303
  • 2
  • 9
10
votes
3 answers

How to pass data or files between Kubeflow containerized components in python

I'm exploring Kubeflow as an option to deploy and connect various components of a typical ML pipeline. I'm using docker containers as Kubeflow components and so far I've been unable to successfully use ContainerOp.file_outputs object to pass results…
Ash
  • 969
  • 3
  • 16
  • 28
8
votes
2 answers

How to use tqdm in Kubernetes

I'm using Kubernetes, and a training job runs on the cluster. I'm using TQDM as progress bar, but unlike what I've expected, the progress bar doesn't show up when I check Kubernetes Pod logs. Does anyone have a solution to this problem?
Piljae Chae
  • 987
  • 10
  • 23
8
votes
2 answers

microk8s Broken K8s Dashboard and Kubeflow Dashboard

I'm using microk8s in an Ubuntu 18.04 LTS VM, 3 cores, 60 GB storage, 12 GB of memory. I followed the instructions from microk8s website here to install it. $ snap install microk8s --classic --channel=1.18/stable $ sudo microk8s start $ sudo…
lwileczek
  • 2,084
  • 18
  • 27
8
votes
1 answer

Kubeflow Pipeline Termination Notificaiton

I tried to add a logic that will send slack notification when the pipeline terminated due to some error. I tried to implement this with ExitHandler. But, seems the ExitHandler can’t dependent on any op. Do you have any good idea?
Wenmin Wu
  • 1,808
  • 12
  • 24
8
votes
6 answers

How to get the id of the run from within a component?

I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?
DSF
  • 83
  • 1
  • 3
7
votes
3 answers

kubeflow pipeline dynamic output list as input parameter

I use a ParallelFor over a dynamic list. I want to collect all the outputs from the loop, and pass them to another ContainerOp. Something like the following, which obviously does not work, since the outputs list is will be static. with…
user3599803
  • 6,435
  • 17
  • 69
  • 130
7
votes
2 answers

How to escape "{{" and "}}" in argo workflow

I want to run one argo workflow in which a value is surrounded with double braces. Argo tries to resolve it but I don't want argo to resolve it. Following is a fraction of katib studyjob workflow manifest. workerSpec: goTemplate: …
shabbir
  • 121
  • 2
  • 6
6
votes
1 answer

How do we assign pods properly so that KFServing can scale down GPU Instances to zero?

I'm setting up an InferenceService using Argo and KFServing with Amazon EKS (Kubernetes). Its important to know that our team has one EKS cluster per environment, which means there can be multiple applications within our cluster that we don't…
Daniel Hair
  • 270
  • 1
  • 4
  • 15
5
votes
1 answer

Is it possible to mix kubeflow components with tensorflow extended components?

It looks like Kubeflow has deprecated all of their TFX components. I currently have some custom Kubeflow components that help launch some of my data pipelines and I was hoping I could use some TFX components in the same kubeflow pipeline. Is there a…
sleepyowl
  • 168
  • 5
5
votes
2 answers

Sharing secrets in Kubeflow pipeline

I want to share some secrets with my Kubeflow pipeline so I can use them as environment variables in my containers. I've written a pipeline-secrets.yaml that looks like this: apiVersion: v1 kind: Secret metadata: name: pipeline-secrets …
João Areias
  • 1,192
  • 11
  • 41
5
votes
1 answer

What's different between TFServing and KFServing on KubeFlow

TFServin and KFServing both deploy the model on Kubeflow, and let users easy to use the model as a service, don't need to know detail about Kubernetes, hiding the infra layers. TFServing is from TensorFlow, it can also run on Kubeflow or…
Kevin Su
  • 542
  • 2
  • 6
  • 24
5
votes
2 answers

Aggregate results when using Kubeflow Pipelines kfp.ParallelFor

What is a good pattern for aggregating the results from Kubeflow Pipleine kfp.ParallelFor?
Jet Basrawi
  • 3,185
  • 2
  • 15
  • 14
1
2 3
28 29