0

I am trying to implement a CNN for image classification. For the life of me I cannot get my code to train, there is some error but I cannot understand the traceback. I will attach my code and copy the traceback code. Any help is appreciated.

Here is the traceback:

Traceback (most recent call last):
  File "/scratch/d/dsussman/dsherman/endo_git_v2/sherman_dataframe_patch_class.py", line 127, in <module>
    verbose=2)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1178, in fit
    tmp_logs = self.train_function(iterator)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3024, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/home/d/dsussman/dsherman/.conda/envs/myNewEnv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found:  No algorithm worked!
         [[node resnet50/conv1_conv/Conv2D (defined at scratch/d/dsussman/dsherman/endo_git_v2/sherman_dataframe_patch_class.py:127) ]]
         [[assert_less_equal/Assert/AssertGuard/pivot_f/_13/_49]]
  (1) Not found:  No algorithm worked!
         [[node resnet50/conv1_conv/Conv2D (defined at scratch/d/dsussman/dsherman/endo_git_v2/sherman_dataframe_patch_class.py:127) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11891]

Function call stack:
train_function -> train_function

2021-09-10 10:45:12.155575: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
         [[{{node PyFunc}}]]

Edit: here is the function that is being called on line 127

model = tensorflow.keras.applications.resnet50.ResNet50(weights=None,
                                                    input_shape=(100,100,1),
                                                    pooling=None,
                                                    classes=3)

model.compile(optimizer=tensorflow.keras.optimizers.Adam(),
          loss='categorical_crossentropy',
          metrics=['accuracy', tensorflow.keras.metrics.FalseNegatives()])

##########          TRAIN MODEL         ##########

history = model.fit(train_set,
                epochs=50,
                validation_data=val_set,
                verbose=2)
  • I've found that tensorflow call stacks are sometimes not very useful, because tensorflow recompiles all your code and then the call stack shows it calling its own internal code and none of yours. But what function is being called on line 127? – user253751 Sep 10 '21 at 14:57
  • it's my `model.fit()`. I will edit the question about what it actually is – Daniel Sherman Sep 10 '21 at 15:10
  • Does this answer your question? https://stackoverflow.com/questions/51025188/tensorflow-notfound-error – user253751 Sep 10 '21 at 15:14
  • I don't think so because I'm not even loading a model from an output folder. I am also not using any checkpoints – Daniel Sherman Sep 10 '21 at 15:18
  • Hi!Could your try again after limiting GPU growth ? Reference https://stackoverflow.com/questions/59340465/how-to-solve-no-algorithm-worked-keras-error https://forums.developer.nvidia.com/t/tensorflow-issue-op-requires-failed-at-conv-ops-fused-impl-h-697-not-found-no-algorithm-worked/167905/4 https://github.com/keras-team/keras/issues/7226 –  Oct 10 '21 at 13:43

0 Answers0