Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Question

While running kubeflow pipeline having code that uses tensorflow 2.0. below error is displayed at end of each epoch

W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

Also, after some epochs, it does not show log and shows this error

This step is in Failed state with this message: The node was low on resource: memory. Container main was using 100213872Ki, which exceeds its request of 0. Container wait was using 25056Ki, which exceeds its request of 0.

I'm getting the first error as well. Haven't seen the second error yet. — markemus, Feb 05 '20 at 23:07

score 5 · Answer 1 · edited May 24 '21 at 15:25

In my case, I didn't match the batch_size and steps_per_epoch

For example,

his = Test_model.fit_generator(datagen.flow(trainrancrop_images, trainrancrop_labels, batch_size=batchsize),
                               steps_per_epoch=len(trainrancrop_images)/batchsize,
                               validation_data=(test_images, test_labels),
                               epochs=1,
                               callbacks=[callback])

batch_size in the datagen.flow must correspond to the steps_per_epoch in Test_model.fit_generator (actually, I used the wrong value on the steps_per_epoch)

This is one of the cases for the Error, I guess.

As a result, I think the problem arises when there is wrong correspondence on the batch size and steps(iterations)

Maybe the floats can be a problem when you get the step by dividing...

Check your code about this issue.

Good luck :)

score 5 · Answer 2 · answered Jun 12 '20 at 13:44

5

Upgrading tensorflow from 2.1 to 2.2 fixed this issue for me. I didn't have to go to tf-nightly version.

answered Jun 12 '20 at 13:44

Safwan

3,300
1
28
33

1

Upgraded TensorFlow 2.1 to TensorFlow 2.2 and this issue is gone. me – user3284804 Jul 02 '20 at 23:37
@user3284804 - Please consider upvoting if this answer helped you. Thanks. – Safwan Jul 03 '20 at 05:10
I am running tensorflow-gpu on a conda env and it keeps installing version 2.1 and if I try to upgrade it using pip3 install --upgrade tensorflow-gpu i can't use it no more does anyone know how to upgrade a tensorflow-gpu version inside of a env – Dhouibi iheb Sep 07 '20 at 01:23
@Dhouibiiheb What do you mean by you cannot use it anymore? – Safwan Sep 08 '20 at 03:38
@Safwan meaning that when I try the following : pip install --upgrade tensorflow==2.2 / 2.3 tensorflow won't work anymore.. as far as I know, conda env supports tf 2.1 for now, not sure though – Dhouibi iheb Sep 09 '20 at 05:10
@Dhouibiiheb `conda` supports tf2.2 now. Use `conda install -c anaconda tensorflow-gpu` to install tf2.2 – Safwan Sep 09 '20 at 07:45
@Safwan I tried it already, nothing changes won't update tf to tf2.2 – Dhouibi iheb Sep 10 '20 at 03:13

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

This was due to incompatible CUDA and Tensorflow versions. below versions work well with each other

tensorflow-gpu==2.0.0

tensorflow-addons==0.6.0

nvidia/cuda:10.0-cudnn7-runtime

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 25 '20 at 05:20

Radhi

6,289
15
47
68

score 1 · Answer 4 · answered Feb 19 '20 at 14:07

1

I have the same problem. People claimed that warming is superfluous and it has been removed in the tf-nightly, see here. But the memory leak is still there for each epoch.

answered Feb 19 '20 at 14:07

MH Yip

329
1
13

score 0 · Answer 5 · answered Feb 11 '20 at 09:58

0

In my case: I installed tf-nightly. Now it's working, Though I am new to tensorflow. I followed this link

You can try.

answered Feb 11 '20 at 09:58

Shantanu Nath

363
3
13

Sajad Homayoun · Answer 6 · 2021-02-23T20:01:34.653

0

To fix the problem you can add workers=1 in model.fit(...).

edited Feb 23 '21 at 20:01

answered Jan 25 '21 at 08:19

Sajad Homayoun

1
1

score 0 · Answer 7 · answered May 11 '21 at 21:40

0

I tried following steps and it worked in my case

conda install tensorflow=2.0.0
conda install -c conda-forge keras=2.3.0

answered May 11 '21 at 21:40

Shruti Jadon

1

Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled

7 Answers7

Linked