1

I am trying to train a model using pycaffe. I use Adam Optimizer The forward and backward codes work fine:

solver.net.forward()
solver.net.backward()

However on the update step (solver.update()) it fails with the following error:

AttributeError: 'AdamSolver' object has no attribute 'update'
F1102 12:14:25.689537 24420 benchmark.cpp:18] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal

When I try the solver.step(1) I gives me:

solver.step(1) failed to work with the following error: 
F1101 19:28:43.213888  5038 benchmark.cpp:30] Check failed: error == cudaSuccess (71 vs. 0)  operation not supported
*** Check failure stack trace: ***
Aborted (core dumped)

I suspect that there is an installation issue, actually I had few test cases that fails in my runtest, is that related and what I would need to fix if I need to rebuild?

EDIT 1: I fixed all the issues I have in the runtest, but still having the same problem.

Kasparov92
  • 1,365
  • 4
  • 14
  • 39
  • what tests have failed? you need to provide more information. Currently it seems like your GPU does not support CUDA version expected by caffe. – Shai Nov 02 '17 at 11:31
  • 1
    Here are the tests that fails, and I fixed that .. now trying to rebuild and find if it will fix this issue or not https://stackoverflow.com/questions/47073514/caffe-runtest-fails/47074039#47074039 – Kasparov92 Nov 02 '17 at 11:46
  • still fails .. all the test cases running successfully – Kasparov92 Nov 02 '17 at 12:03
  • do you have more than one GPU on your machine? are both errors still occurring? – Shai Nov 02 '17 at 12:10
  • I use `caffe.set_device(0)` so i believe its not multiple GPU issue and nah its only one GPU anyway – Kasparov92 Nov 02 '17 at 12:12
  • and yes I have both errors occuring, but for the `solver.update`, I have an additional error printed: `F1102 12:14:25.689537 24420 benchmark.cpp:18] Check failed: error == cudaSuccess (10 vs. 0) invalid device ordinal` – Kasparov92 Nov 02 '17 at 12:15

1 Answers1

1

Placecaffe.set_mode_gpu() and caffe.set_device(0) before the caffe.get_solver(solver_path) solved the issue.

Kasparov92
  • 1,365
  • 4
  • 14
  • 39