0

I'm trying to create a python package and publish it to pypi for others to use, my package needs to call some custom cuda kernels. I don't want users to have cuda toolkit/nvcc on their computer in order to compile the kernels and install this package. So I want to compile them beforehand into .so/.dll/.cubin files then make calls to them inside python. What is the best way to achieve this, especially to make it compatible with different operating systems and different gpu architectures?

talonmies
  • 70,661
  • 34
  • 192
  • 269
omer sahban
  • 77
  • 1
  • 8
  • one approach is python with ctypes: https://stackoverflow.com/questions/45515526/cuda-shared-memory-issue-and-using-cuda-with-python-ctypes/45524623#45524623 – Robert Crovella Jul 28 '23 at 03:20

0 Answers0