14

I am writing a Python application that uses OpenCV's Python bindings to do marker detection and other image processing. I would like to use OpenCV's CUDA modules to CUDA-accelerate certain parts of my application, and noticed in their .hpp files that they seem to be using the OpenCV export macros for Python and Java. However, I do not seem to be able to access those CUDA functions, even though I am building OpenCV WITH_CUDA=ON.

Is it necessary to use a wrapper such as PyCUDA in order to access the GPU functions, such as threshold in cudaarithm? Or, are these CUDA-accelerated functions already being used if I call cv2.threshold() in my Python code (rather than the regular, CPU-based implementation)?

CV_EXPORTS double threshold(InputArray src, OutputArray dst, double thresh, double maxval, int type, Stream& stream = Stream::Null());

The submodules I see for cv2 are the following:

  • Error
  • aruco
  • detail
  • fisheye
  • flann
  • instr
  • ml
  • ocl
  • ogl
  • videostab

cv2.cuda, cv2.gpu, and cv2.cudaarithm all return with an AttributeError.

The CMake instruction I am running to build OpenCV is as follows:

cmake -DOPENCV_EXTRA_MODULES_PATH=/usr/local/lib/opencv_contrib/modules/ \
    -D WITH_CUDA=ON -D CUDA_FAST_MATH=1 \
    -D ENABLE_PRECOMPILED_HEADERS=OFF \
    -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF \
    -D BUILD_opencv_java=OFF \
    -DBUILD_opencv_bgsegm=OFF -DBUILD_opencv_bioinspired=OFF -DBUILD_opencv_ccalib=OFF -DBUILD_opencv_cnn_3dobj=OFF -DBUILD_opencv_contrib_world=OFF -DBUILD_opencv_cvv=OFF -DBUILD_opencv_datasets=OFF -DBUILD_openc
v_dnn=OFF -DBUILD_opencv_dnns_easily_fooled=OFF -DBUILD_opencv_dpm=OFF -DBUILD_opencv_face=OFF -DBUILD_opencv_fuzzy=OFF -DBUILD_opencv_hdf=OFF -DBUILD_opencv_line_descriptor=OFF -DBUILD_opencv_matlab=OFF -DBUILD_o
pencv_optflow=OFF -DBUILD_opencv_plot=OFF -DBUILD_opencv_README.md=OFF -DBUILD_opencv_reg=OFF -DBUILD_opencv_rgbd=OFF -DBUILD_opencv_saliency=OFF -DBUILD_opencv_sfm=OFF -DBUILD_opencv_stereo=OFF -DBUILD_opencv_str
uctured_light=OFF -DBUILD_opencv_surface_matching=OFF -DBUILD_opencv_text=OFF -DBUILD_opencv_tracking=OFF -DBUILD_opencv_viz=OFF -DBUILD_opencv_xfeatures2d=OFF -DBUILD_opencv_ximgproc=OFF -DBUILD_opencv_xobjdetect
=OFF -DBUILD_opencv_xphoto=OFF ..
c-x-berger
  • 991
  • 12
  • 30
ostrumvulpes
  • 542
  • 1
  • 4
  • 14
  • As you noticed, OpenCV has their own python bindings to c++ functions. You don't need pycuda as far as I know. Which version of OpenCV are you using? Accessing OpenCV Cuda functions should be straight forward. – NAmorim Feb 09 '17 at 11:32
  • Hey @NAmorim, thanks for commenting! I am using OpenCV 3.2.0-dev. However, when I load the modules available for cv2, I do not see a submodule for CUDA (see updated question). Are functions that have CUDA-accelerated counterparts already substituted in the Python .so? – ostrumvulpes Feb 09 '17 at 22:20
  • 4
    Starting from OpenCV 4, python bindings to CUDA accelerated code should work. Here is a post about how to achieve it: [**Accelerating OpenCV 4 – build with CUDA 10.0, Intel MKL + TBB and python bindings in Windows**](https://jamesbowley.co.uk/build-opencv-4-0-0-with-cuda-10-0-and-intel-mkl-tbb-in-windows/) – nchaumont Mar 19 '19 at 21:26

4 Answers4

29

So as confirmed in the answer and comment thread with @NAmorim, there are no accessible Python bindings to OpenCV's various CUDA modules.

I was able to get around this restriction by using Cython to gain access to the CUDA functions I needed and implementing the necessary logic to convert my Python objects (mainly NumPy arrays) to OpenCV C/C++ objects and back.

Working Code

I first wrote a Cython definition file, GpuWrapper.pxd. The purpose of this file is to reference external C/C++ classes and methods, such as the CUDA methods I am interested in.

from libcpp cimport bool
from cpython.ref cimport PyObject

# References PyObject to OpenCV object conversion code borrowed from OpenCV's own conversion file, cv2.cpp
cdef extern from 'pyopencv_converter.cpp':
    cdef PyObject* pyopencv_from(const Mat& m)
    cdef bool pyopencv_to(PyObject* o, Mat& m)

cdef extern from 'opencv2/imgproc.hpp' namespace 'cv':
    cdef enum InterpolationFlags:
        INTER_NEAREST = 0
    cdef enum ColorConversionCodes:
        COLOR_BGR2GRAY

cdef extern from 'opencv2/core/core.hpp':
    cdef int CV_8UC1
    cdef int CV_32FC1

cdef extern from 'opencv2/core/core.hpp' namespace 'cv':
    cdef cppclass Size_[T]:
        Size_() except +
        Size_(T width, T height) except +
        T width
        T height
    ctypedef Size_[int] Size2i
    ctypedef Size2i Size
    cdef cppclass Scalar[T]:
        Scalar() except +
        Scalar(T v0) except +

cdef extern from 'opencv2/core/core.hpp' namespace 'cv':
    cdef cppclass Mat:
        Mat() except +
        void create(int, int, int) except +
        void* data
        int rows
        int cols

cdef extern from 'opencv2/core/cuda.hpp' namespace 'cv::cuda':
    cdef cppclass GpuMat:
        GpuMat() except +
        void upload(Mat arr) except +
        void download(Mat dst) const
    cdef cppclass Stream:
        Stream() except +

cdef extern from 'opencv2/cudawarping.hpp' namespace 'cv::cuda':
    cdef void warpPerspective(GpuMat src, GpuMat dst, Mat M, Size dsize, int flags, int borderMode, Scalar borderValue, Stream& stream)
    # Function using default values
    cdef void warpPerspective(GpuMat src, GpuMat dst, Mat M, Size dsize, int flags)

We also need the ability to convert Python objects to OpenCV objects. I copied the first couple hundred lines from OpenCV's modules/python/src2/cv2.cpp. You can find that code below in the appendix.

We can finally write our Cython wrapper methods to call OpenCV's CUDA functions! This is part of the Cython implementation file, GpuWrapper.pyx.

import numpy as np  # Import Python functions, attributes, submodules of numpy
cimport numpy as np  # Import numpy C/C++ API

def cudaWarpPerspectiveWrapper(np.ndarray[np.uint8_t, ndim=2] _src,
                               np.ndarray[np.float32_t, ndim=2] _M,
                               _size_tuple,
                               int _flags=INTER_NEAREST):
    # Create GPU/device InputArray for src
    cdef Mat src_mat
    cdef GpuMat src_gpu
    pyopencv_to(<PyObject*> _src, src_mat)
    src_gpu.upload(src_mat)

    # Create CPU/host InputArray for M
    cdef Mat M_mat = Mat()
    pyopencv_to(<PyObject*> _M, M_mat)

    # Create Size object from size tuple
    # Note that size/shape in Python is handled in row-major-order -- therefore, width is [1] and height is [0]
    cdef Size size = Size(<int> _size_tuple[1], <int> _size_tuple[0])

    # Create empty GPU/device OutputArray for dst
    cdef GpuMat dst_gpu = GpuMat()
    warpPerspective(src_gpu, dst_gpu, M_mat, size, INTER_NEAREST)

    # Get result of dst
    cdef Mat dst_host
    dst_gpu.download(dst_host)
    cdef np.ndarray out = <np.ndarray> pyopencv_from(dst_host)
    return out

After running a setup script to cythonize and compile this code (see apendix), we can import GpuWrapper as a Python module and run cudaWarpPerspectiveWrapper. This allowed me to run the code through CUDA with only a mismatch of 0.34722222222222854% -- quite exciting!

References (can only post max of 2)

Appendix

pyopencv_converter.cpp

#include <Python.h>
#include "numpy/ndarrayobject.h"
#include "opencv2/core/core.hpp"

static PyObject* opencv_error = 0;

// === FAIL MESSAGE ====================================================================================================

static int failmsg(const char *fmt, ...)
{
    char str[1000];

    va_list ap;
    va_start(ap, fmt);
    vsnprintf(str, sizeof(str), fmt, ap);
    va_end(ap);

    PyErr_SetString(PyExc_TypeError, str);
    return 0;
}

struct ArgInfo
{
    const char * name;
    bool outputarg;
    // more fields may be added if necessary

    ArgInfo(const char * name_, bool outputarg_)
        : name(name_)
        , outputarg(outputarg_) {}

    // to match with older pyopencv_to function signature
    operator const char *() const { return name; }
};

// === THREADING =======================================================================================================

class PyAllowThreads
{
public:
    PyAllowThreads() : _state(PyEval_SaveThread()) {}
    ~PyAllowThreads()
    {
        PyEval_RestoreThread(_state);
    }
private:
    PyThreadState* _state;
};

class PyEnsureGIL
{
public:
    PyEnsureGIL() : _state(PyGILState_Ensure()) {}
    ~PyEnsureGIL()
    {
        PyGILState_Release(_state);
    }
private:
    PyGILState_STATE _state;
};

// === ERROR HANDLING ==================================================================================================

#define ERRWRAP2(expr) \
try \
{ \
    PyAllowThreads allowThreads; \
    expr; \
} \
catch (const cv::Exception &e) \
{ \
    PyErr_SetString(opencv_error, e.what()); \
    return 0; \
}

// === USING NAMESPACE CV ==============================================================================================

using namespace cv;

// === NUMPY ALLOCATOR =================================================================================================

class NumpyAllocator : public MatAllocator
{
public:
    NumpyAllocator() { stdAllocator = Mat::getStdAllocator(); }
    ~NumpyAllocator() {}

    UMatData* allocate(PyObject* o, int dims, const int* sizes, int type, size_t* step) const
    {
        UMatData* u = new UMatData(this);
        u->data = u->origdata = (uchar*)PyArray_DATA((PyArrayObject*) o);
        npy_intp* _strides = PyArray_STRIDES((PyArrayObject*) o);
        for( int i = 0; i < dims - 1; i++ )
            step[i] = (size_t)_strides[i];
        step[dims-1] = CV_ELEM_SIZE(type);
        u->size = sizes[0]*step[0];
        u->userdata = o;
        return u;
    }

    UMatData* allocate(int dims0, const int* sizes, int type, void* data, size_t* step, int flags, UMatUsageFlags usageFlags) const
    {
        if( data != 0 )
        {
            CV_Error(Error::StsAssert, "The data should normally be NULL!");
            // probably this is safe to do in such extreme case
            return stdAllocator->allocate(dims0, sizes, type, data, step, flags, usageFlags);
        }
        PyEnsureGIL gil;

        int depth = CV_MAT_DEPTH(type);
        int cn = CV_MAT_CN(type);
        const int f = (int)(sizeof(size_t)/8);
        int typenum = depth == CV_8U ? NPY_UBYTE : depth == CV_8S ? NPY_BYTE :
                      depth == CV_16U ? NPY_USHORT : depth == CV_16S ? NPY_SHORT :
                      depth == CV_32S ? NPY_INT : depth == CV_32F ? NPY_FLOAT :
                      depth == CV_64F ? NPY_DOUBLE : f*NPY_ULONGLONG + (f^1)*NPY_UINT;
        int i, dims = dims0;
        cv::AutoBuffer<npy_intp> _sizes(dims + 1);
        for( i = 0; i < dims; i++ )
            _sizes[i] = sizes[i];
        if( cn > 1 )
            _sizes[dims++] = cn;
        PyObject* o = PyArray_SimpleNew(dims, _sizes, typenum);
        if(!o)
            CV_Error_(Error::StsError, ("The numpy array of typenum=%d, ndims=%d can not be created", typenum, dims));
        return allocate(o, dims0, sizes, type, step);
    }

    bool allocate(UMatData* u, int accessFlags, UMatUsageFlags usageFlags) const
    {
        return stdAllocator->allocate(u, accessFlags, usageFlags);
    }

    void deallocate(UMatData* u) const
    {
        if(!u)
            return;
        PyEnsureGIL gil;
        CV_Assert(u->urefcount >= 0);
        CV_Assert(u->refcount >= 0);
        if(u->refcount == 0)
        {
            PyObject* o = (PyObject*)u->userdata;
            Py_XDECREF(o);
            delete u;
        }
    }

    const MatAllocator* stdAllocator;
};

// === ALLOCATOR INITIALIZATION ========================================================================================

NumpyAllocator g_numpyAllocator;

// === CONVERTOR FUNCTIONS =============================================================================================

template<typename T> static
bool pyopencv_to(PyObject* obj, T& p, const char* name = "<unknown>");

template<typename T> static
PyObject* pyopencv_from(const T& src);

enum { ARG_NONE = 0, ARG_MAT = 1, ARG_SCALAR = 2 };

// special case, when the convertor needs full ArgInfo structure
static bool pyopencv_to(PyObject* o, Mat& m, const ArgInfo info)
{
    bool allowND = true;
    if(!o || o == Py_None)
    {
        if( !m.data )
            m.allocator = &g_numpyAllocator;
        return true;
    }

    if( PyInt_Check(o) )
    {
        double v[] = {static_cast<double>(PyInt_AsLong((PyObject*)o)), 0., 0., 0.};
        m = Mat(4, 1, CV_64F, v).clone();
        return true;
    }
    if( PyFloat_Check(o) )
    {
        double v[] = {PyFloat_AsDouble((PyObject*)o), 0., 0., 0.};
        m = Mat(4, 1, CV_64F, v).clone();
        return true;
    }
    if( PyTuple_Check(o) )
    {
        int i, sz = (int)PyTuple_Size((PyObject*)o);
        m = Mat(sz, 1, CV_64F);
        for( i = 0; i < sz; i++ )
        {
            PyObject* oi = PyTuple_GET_ITEM(o, i);
            if( PyInt_Check(oi) )
                m.at<double>(i) = (double)PyInt_AsLong(oi);
            else if( PyFloat_Check(oi) )
                m.at<double>(i) = (double)PyFloat_AsDouble(oi);
            else
            {
                failmsg("%s is not a numerical tuple", info.name);
                m.release();
                return false;
            }
        }
        return true;
    }

    if( !PyArray_Check(o) )
    {
        failmsg("%s is not a numpy array, neither a scalar", info.name);
        return false;
    }

    PyArrayObject* oarr = (PyArrayObject*) o;

    bool needcopy = false, needcast = false;
    int typenum = PyArray_TYPE(oarr), new_typenum = typenum;
    int type = typenum == NPY_UBYTE ? CV_8U :
               typenum == NPY_BYTE ? CV_8S :
               typenum == NPY_USHORT ? CV_16U :
               typenum == NPY_SHORT ? CV_16S :
               typenum == NPY_INT ? CV_32S :
               typenum == NPY_INT32 ? CV_32S :
               typenum == NPY_FLOAT ? CV_32F :
               typenum == NPY_DOUBLE ? CV_64F : -1;

    if( type < 0 )
    {
        if( typenum == NPY_INT64 || typenum == NPY_UINT64 || typenum == NPY_LONG )
        {
            needcopy = needcast = true;
            new_typenum = NPY_INT;
            type = CV_32S;
        }
        else
        {
            failmsg("%s data type = %d is not supported", info.name, typenum);
            return false;
        }
    }

#ifndef CV_MAX_DIM
    const int CV_MAX_DIM = 32;
#endif

    int ndims = PyArray_NDIM(oarr);
    if(ndims >= CV_MAX_DIM)
    {
        failmsg("%s dimensionality (=%d) is too high", info.name, ndims);
        return false;
    }

    int size[CV_MAX_DIM+1];
    size_t step[CV_MAX_DIM+1];
    size_t elemsize = CV_ELEM_SIZE1(type);
    const npy_intp* _sizes = PyArray_DIMS(oarr);
    const npy_intp* _strides = PyArray_STRIDES(oarr);
    bool ismultichannel = ndims == 3 && _sizes[2] <= CV_CN_MAX;

    for( int i = ndims-1; i >= 0 && !needcopy; i-- )
    {
        // these checks handle cases of
        //  a) multi-dimensional (ndims > 2) arrays, as well as simpler 1- and 2-dimensional cases
        //  b) transposed arrays, where _strides[] elements go in non-descending order
        //  c) flipped arrays, where some of _strides[] elements are negative
        // the _sizes[i] > 1 is needed to avoid spurious copies when NPY_RELAXED_STRIDES is set
        if( (i == ndims-1 && _sizes[i] > 1 && (size_t)_strides[i] != elemsize) ||
            (i < ndims-1 && _sizes[i] > 1 && _strides[i] < _strides[i+1]) )
            needcopy = true;
    }

    if( ismultichannel && _strides[1] != (npy_intp)elemsize*_sizes[2] )
        needcopy = true;

    if (needcopy)
    {
        if (info.outputarg)
        {
            failmsg("Layout of the output array %s is incompatible with cv::Mat (step[ndims-1] != elemsize or step[1] != elemsize*nchannels)", info.name);
            return false;
        }

        if( needcast ) {
            o = PyArray_Cast(oarr, new_typenum);
            oarr = (PyArrayObject*) o;
        }
        else {
            oarr = PyArray_GETCONTIGUOUS(oarr);
            o = (PyObject*) oarr;
        }

        _strides = PyArray_STRIDES(oarr);
    }

    // Normalize strides in case NPY_RELAXED_STRIDES is set
    size_t default_step = elemsize;
    for ( int i = ndims - 1; i >= 0; --i )
    {
        size[i] = (int)_sizes[i];
        if ( size[i] > 1 )
        {
            step[i] = (size_t)_strides[i];
            default_step = step[i] * size[i];
        }
        else
        {
            step[i] = default_step;
            default_step *= size[i];
        }
    }

    // handle degenerate case
    if( ndims == 0) {
        size[ndims] = 1;
        step[ndims] = elemsize;
        ndims++;
    }

    if( ismultichannel )
    {
        ndims--;
        type |= CV_MAKETYPE(0, size[2]);
    }

    if( ndims > 2 && !allowND )
    {
        failmsg("%s has more than 2 dimensions", info.name);
        return false;
    }

    m = Mat(ndims, size, type, PyArray_DATA(oarr), step);
    m.u = g_numpyAllocator.allocate(o, ndims, size, type, step);
    m.addref();

    if( !needcopy )
    {
        Py_INCREF(o);
    }
    m.allocator = &g_numpyAllocator;

    return true;
}

template<>
bool pyopencv_to(PyObject* o, Mat& m, const char* name)
{
    return pyopencv_to(o, m, ArgInfo(name, 0));
}

template<>
PyObject* pyopencv_from(const Mat& m)
{
    if( !m.data )
        Py_RETURN_NONE;
    Mat temp, *p = (Mat*)&m;
    if(!p->u || p->allocator != &g_numpyAllocator)
    {
        temp.allocator = &g_numpyAllocator;
        ERRWRAP2(m.copyTo(temp));
        p = &temp;
    }
    PyObject* o = (PyObject*)p->u->userdata;
    Py_INCREF(o);
    return o;
}

setupGpuWrapper.py

import subprocess
import os
import numpy as np
from distutils.core import setup, Extension
from Cython.Build import cythonize
from Cython.Distutils import build_ext

"""
Run setup with the following command:
```
python setupGpuWrapper.py build_ext --inplace
```
"""

# Determine current directory of this setup file to find our module
CUR_DIR = os.path.dirname(__file__)
# Use pkg-config to determine library locations and include locations
opencv_libs_str = subprocess.check_output("pkg-config --libs opencv".split()).decode()
opencv_incs_str = subprocess.check_output("pkg-config --cflags opencv".split()).decode()
# Parse into usable format for Extension call
opencv_libs = [str(lib) for lib in opencv_libs_str.strip().split()]
opencv_incs = [str(inc) for inc in opencv_incs_str.strip().split()]

extensions = [
    Extension('GpuWrapper',
              sources=[os.path.join(CUR_DIR, 'GpuWrapper.pyx')],
              language='c++',
              include_dirs=[np.get_include()] + opencv_incs,
              extra_link_args=opencv_libs)
]

setup(
    cmdclass={'build_ext': build_ext},
    name="GpuWrapper",
    ext_modules=cythonize(extensions)
)
Community
  • 1
  • 1
ostrumvulpes
  • 542
  • 1
  • 4
  • 14
  • I am following your description, but when applying cudaWarpPerspectiveWrapper function, I got ```Segmentation fault (core dumped)``` error, not sure where I went wrong. When I was compiling cython, I got 2 warnings: ```cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++``` ```warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]``` which seem to be fine. I am using opencv 3.3, python 2.7 and cython 0.26 – Xinyao Wang Aug 23 '17 at 21:09
  • Hey Xinyao, I think I got those warnings too -- they _should_ be harmless. I think I was using OpenCV 3.2, Python 3.2, and Cython 0.25. It may be a discrepency between the Python versions? Something important you also have to do with numpy is to call numpy.import_array() -- see here: https://docs.scipy.org/doc/numpy-1.10.0/user/c-info.how-to-extend.html#required-subroutine. I remember getting frustrating seg faults when forgetting to set that! – ostrumvulpes Aug 25 '17 at 00:13
  • Thanks for responding! I tried different python (3.5 and 2.7) and opencv (3.1,3.2,3.3), and still not working. I think it is caused by not calling numpy.import_array() since I have no idea about that before you mentioned. I posted a question here: [link](https://stackoverflow.com/questions/45850197/not-able-to-convert-numpy-array-to-opencv-mat-in-cython-when-trying-to-write-c) It would be greate if you have time to take a look. I found another code on github to walk around converting numpy array to cv mat just now, but it will be wonderful if you could provide some possible solutions. – Xinyao Wang Aug 25 '17 at 01:00
  • 1
    I was not ablt to figure out where to include numpy.import_array(), it seems should not be included in my .py file since numpy has no attribute import_array in python. Just too new to these stuff. – Xinyao Wang Aug 25 '17 at 01:04
  • 4
    Hi, I tried to put numpy.import_array() in GpuWrapper.pyx and it worked like a charm right now. Thank you for your solution, really appreciated! – Xinyao Wang Aug 25 '17 at 02:08
  • @ostrumvulpes many thanks to both of you! this is a great time saver for Cython newbies. – nazikus Jul 03 '18 at 15:53
5

I did some testing on this with OpenCV 4.0.0. @nchaumont mentioned that starting with OpenCV 4, there were Python bindings for CUDA included.

As of at least Open CV 4.1.0, possibly earlier, the default Python bindings include CUDA, provided that Open CV was built with CUDA support.

Most functionality appears to be exposed as cv2.cuda.thing (for example, cv2.cuda.cvtColor().)

Currently, they lack any online documentation - for example, the GPU Canny edge detector makes no mention of Python. You can use the help function at Python's REPL to see the C++ docs though, which should be mostly equivalent.

c-x-berger
  • 991
  • 12
  • 30
  • Since OpenCV 4.4.0 cv::cuda::CascadeClassifier should be back, I however can't find the python binding. It's not under cv2.cuda.CascadeClassifier. Any chance someone else found this? – Jop Knoppers Sep 01 '20 at 12:25
  • cv2.cuda_CascadeClassifier gives a segmentation fault.. – Jop Knoppers Sep 01 '20 at 12:33
4

I used following way to access OpenCV's C++ CUDA methods in Python:

  1. Create custom opencv_contrib module
  2. Write C++ code to wrap the OpenCV CUDA method
  3. Using OpenCV python bindings, expose your custom method
  4. Build opencv with opencv_contrib
  5. Run python code to test

I created a small github repo to demonstrate the same

Neeraj Gulia
  • 640
  • 8
  • 24
0

Or, are these CUDA-accelerated functions already being used if I call cv2.threshold() in my Python code (rather than the regular, CPU-based implementation)

No, you have to explicitly call them from the GPU accelerated module. Calling cv2.threshold() will simply run the CPU version.

Since the python API of OpenCV wraps around C++ functions, checking the C++ API usually offers useful hints on where the functions/modules are.

For instance, by this transition guide you can see the API changes that were made from OpenCV 2.X to 3.X. Here, the GPU module on OpenCV 3.X can be accessed by cv2.cuda and cv2.gpu on previous versions. And the cuda module in 3.X is divided into several small pieces:

  • cuda - CUDA-accelerated Computer Vision
  • cudaarithm - Operations on Matrices
  • cudabgsegm - Background Segmentation
  • cudacodec - Video Encoding/Decoding
  • cudafeatures2d - Feature Detection and Description
  • cudafilters - Image Filtering
  • cudaimgproc - Image Processing
  • cudalegacy - Legacy support
  • cudaoptflow - Optical Flow
  • cudastereo - Stereo Correspondence
  • cudawarping - Image Warping
  • cudev - Device layer

You should search for these modules within cv2.

NAmorim
  • 706
  • 6
  • 12
  • 1
    Unfortunately, I cannot find any of these modules which seem like they should be there: `>>> cv2.cuda Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'cuda' >>> cv2.gpu Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'gpu' >>> cv2.cudaarithm Traceback (most recent call last): File "", line 1, in AttributeError: 'module' object has no attribute 'cudaarithm'` – ostrumvulpes Feb 10 '17 at 20:44
  • Did you get the library to work meanwhile? Have you tried to check if the building OpenCV + Cuda was successful? For instance, if in python you run **print cv2.getBuildInformation()** you should get all the cmake flags that were activated. Theres a line that should be **Use Cuda: Yes**. – NAmorim Feb 13 '17 at 11:08
  • 4
    Hey @NAmorim, I indeed have `Use Cuda: Yes` enabled. It seems that Python does not have any bindings to the CUDA-related modules, as the GpuArray types are not exposed to Python in the first place. The solution I'm investigating currently is to use PyCUDA and ctypes to call my own C++ code From Python that calls the OpenCV CUDA functions. I will see if this is a good solution and try to keep this post updated! – ostrumvulpes Feb 13 '17 at 22:07
  • @ostrumvulpes Well that's new to me. I have used Cuda in c++ and I thought python would have bindings for it too (Good thing the opencv python documentation is nearly inexistent...). Good luck with that! – NAmorim Feb 14 '17 at 10:37