1

I like to create a child class of scikit-learns's sklearn.cluster.KMeans and would like to do this in cython for performance reasons. Is this possible?

There is an old issue https://github.com/scikit-learn/scikit-learn/issues/2057 which seems to be related and indicates the (non-)publication of pxd-files. Deriving in cython from one the main classes in sklearn would really be useful, however, so I ask here if there is now any solution.

My source file:

from sklearn.cluster cimport KMeans
cimport cython
@cython.cclass
class OtherKMeans(KMeans):
    def __init__(self,kappa, **kwargs):
        self.kappa = kappa
        super().__init__(**kwargs)

my setup file setup1.py

from setuptools import setup
from Cython.Build import cythonize
setup(
   ext_modules=cythonize("otherkmeans.pyx"),
   compiler_directives={'language_level' : "3"}
)

Result of calling

python setup1.py build_ext --inplace

is

src> python setup1.py build_ext --inplace
Compiling otherkmeans.pyx because it changed.
[1/1] Cythonizing otherkmeans.pyx
/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /home/musk/new-k-means/src/otherkmeans.pyx
  tree = Parsing.p_module(s, pxd, full_module_name)

Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
^
------------------------------------------------------------

otherkmeans.pyx:1:0: 'sklearn/cluster.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
^
------------------------------------------------------------

otherkmeans.pyx:1:0: 'sklearn/cluster/KMeans.pxd' not found

Error compiling Cython file:
------------------------------------------------------------
...
from sklearn.cluster cimport KMeans
cimport cython
@cython.cclass
class OtherKMeans(KMeans):
                 ^
------------------------------------------------------------

otherkmeans.pyx:4:18: First base of 'OtherKMeans' is not an extension type
Traceback (most recent call last):
  File "/home/miller/new-k-means/src/setup1.py", line 4, in <module>
    ext_modules=cythonize("otherkmeans.pyx"),
  File "/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1127, in cythonize
    cythonize_one(*args)
  File "/home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/Cython/Build/Dependencies.py", line 1250, in cythonize_one
    raise CompileError(None, pyx_file)
Cython.Compiler.Errors.CompileError: otherkmeans.pyx

How can I fix this?

Barden
  • 1,020
  • 1
  • 10
  • 17
  • pxd file should be included. See https://github.com/scikit-learn/scikit-learn/issues/14847, if one is not included you probably should submit a bug report. – ead May 02 '22 at 17:00
  • See also https://stackoverflow.com/q/57697577/5769463 – ead May 02 '22 at 17:02
  • Thx. I have looked in /home/miller/miniconda2/envs/bkm9/lib/python3.9/site-packages/sklearn/cluster/ but the only pxd file there was _k_means_common.pxd. Is there any way I need to modify my setup file to find those files? Where are they supposed to be? – Barden May 02 '22 at 19:17
  • I think your assumption, that there is a cluster.pxd is wrong. The class ist not exported and cannot be reused. – ead May 02 '22 at 21:12
  • I was actually hoping for kmeans.pxd so that I could cimport KMeans – Barden May 02 '22 at 21:34

1 Answers1

1

KMeans looks to be a regular Python class. That means you can't use it as the first base for a cdef class (you can actually derive a cdef class from it with it as a second/third/etc base though).

You can compile regular (i.e. non-cdef) classes in Cython. Their functions are still sped up by Cython. The limitations are:

  • they aren't generated as a C struct so you can't get fast access to C defined members,
  • they can't have cdef member functions (but if you care about this you might be able to use global cdef functions instead... Remember though, the cdef is mainly a calling convention - a regular function is still sped up by Python).

Therefore you should ask yourself "do I actually need a cdef/cython.cclass class?" and the answer is probably "no".

DavidW
  • 29,336
  • 6
  • 55
  • 86