0

My code structure is as follows:

myMLCode 
│  
├── main.py
├── ML_lib
|   ├── __init__.py
│   └── core.py
|   └── set1
|       ├── __init__.py
│       └── mymod1.py
│       └── mymod2.py
|   └── set2
|       ├── __init__.py
│       └── mymod3.py
│       └── mymod4.py
├── config
│   ├── config1.yml
│   └── config2.yml
├── models
│   ├── model1.h5
│   └── model2.h5
├── setup.py 

What I would like to do is to make a wheel file using the cythonized code from this whole package and be able to run the code seamlessly.

Expectation is to run with python main.py Plus I want to edit the config files and update the model files from time to time and continue to use the package.

What I managed to do so far is with the following setup.py file:

from Cython.Distutils import build_ext
from Cython.Build import cythonize
from setuptools.extension import Extension
from setuptools.command.build_py import build_py as build_py_orig
from pathlib import Path
from setuptools import find_packages, setup, Command
import os
import shutil


class MyBuildExt(build_ext):
    def run(self):
        build_ext.run(self)

        build_dir = Path(self.build_lib)
        root_dir = Path(__file__).parent

        target_dir = build_dir if not self.inplace else root_dir

        self.copy_file('ML_lib/__init__.py', root_dir, target_dir)
        self.copy_file('ML_lib/set1/__init__.py', root_dir, target_dir)
        self.copy_file('Ml_lib/set2/__init__.py', root_dir, target_dir)

def copy_file(self, path, source_dir, destination_dir):
    if not (source_dir / path).exists():
        return

    shutil.copyfile(str(source_dir / path), str(destination_dir / path))

extensions = [
    Extension("core", ["core.py"]),
    Extension("ML_lib.set1.*", ["ML_lib/set2/*.py"]),
    Extension("ML_lib.set2.*", ["ML_lib/set2/*.py"]),
    Extension("ML_lib.*", ["ML_lib/*.py"]),
]

setup(
    name="myMLCode",
    version="0.0.1",
    author="myself",
    description="This is compiled ML code",
    ext_modules=cythonize(
        extensions,
        build_dir="build",
        compiler_directives=dict(
        always_allow_keywords=True
        )),
    data_files=[
        ('config',['config/config1.yml']),
        ('config',['config/config2.yml']),
        ('models',['models/model1.h5']),
        ('models',['models/model2h5']),
    ],
    cmdclass={
        'build_ext': MyBuildExt
    },
    entry_points={
    },
)

This makes a wheel file which contains the following:

myMLCode-0.0.1-cp37-cp37m-linux_x86_64.whl
------------------------------------------
 main.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/__init__.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/__init__.py'
'ML_lib/core.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set1/mymod1.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set1/mymod2.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set1/__init__.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set1/__init__.py'
'ML_lib/set2/mymod3.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set2/mymod4.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set2/__init__.cpython-37m-x86_64-linux-gnu.so'
'ML_lib/set2/__init__.py'
'myMLCode-0.0.1.data/data/config/config1.yml'
'myMLCode-0.0.1.data/data/config/config2.yml'
'myMLCode-0.0.1.data/data/models/model1.h5'
'myMLCode-0.0.1.data/data/models/model2.h5'
'myMLCode-0.0.1.dist-info/METADATA'
'myMLCode-0.0.1.dist-info/WHEEL'
'myMLCode-0.0.1.dist-info/top_level.txt'
'myMLCode-0.0.1.dist-info/RECORD'

I then installed this wheel file with pip install. I listed the libraries to check if its installed and then opened a python3.7 terminal to use this, but I get an Import Error.

[user@userhome~]$ pip3.7 list
Package            Version
------------------ -------
appdirs            1.4.4
distlib            0.3.1
filelock           3.0.12
importlib-metadata 4.0.1
pip                20.1.1
setuptools         47.1.0
six                1.15.0
typing-extensions  3.7.4.3
virtualenv         20.4.4
myMLCode           0.0.1
zipp               3.4.1
[user@userhome ~]$ python3.7
Python 3.7.9 (default, Apr 27 2021, 07:49:13)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import myMLCode
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

ModuleNotFoundError: No module named 'myMLCode'

I tried to unzip the package and tried to run the code with .so files directly. It works well, except for the config and model file references. The package puts the files in myMLCode.data.data.config and myMLCode.data.data.models. I did a hack and changed all relative paths in the source code to refer to this new location. It works with this, but this is not an elegant solution since the plain python code stops working since it doesnt know about these new folders.

Any pointers would be really helpful.

N2M
  • 199
  • 1
  • 15
  • Already referred several related links, but could not find the full answer yet: https://stackoverflow.com/questions/39499453/package-only-binary-compiled-so-files-of-a-python-library-compiled-with-cython/56043918#56043918 https://stackoverflow.com/questions/56024286/package-only-cythonized-binary-python-files-and-resource-data-but-ignoring-pytho – N2M May 12 '21 at 10:42
  • First thing is to manage packaging the wheel without cythonizing. Why doesn't your setup declare `packages` list? Why are you using broken `data_files` instead of `package_data`? Why is `main.py` outside of `ML_lib` and not packaged? – hoefling May 12 '21 at 21:04
  • @hoefling I had added data_files following ans mentioned in this link : https://stackoverflow.com/questions/24347450/how-do-you-add-additional-files-to-a-wheel?noredirect=1&lq=1 I had removed packages based on this: https://bucharjan.cz/blog/using-cython-to-protect-a-python-codebase.html, I had also followed your ans setup mentioned in this link: https://stackoverflow.com/questions/39499453/package-only-binary-compiled-so-files-of-a-python-library-compiled-with-cython, and got the same package, but could not get data file into it. – N2M May 13 '21 at 10:19
  • The setup from my answer works, but your packaging right now is a mess. Again, my advice: write a setup script first that works without cythonizing. Don't use `data_files`. Include sources via `packages=find_packages()`. Don't import `myMLCode` since you don't have a package or module named like that. Start with easy things and add complex stuff incrementally. – hoefling May 13 '21 at 15:10

1 Answers1

0

According to your folder structure, your module name should be ML_lib. Your wheel package name is not equal to the module name. If your module name is myMLCode, you need to add the following code to ML_lib/__init__.py:

from .myMLCode import * 

Then import the module in Python:

import ML_lib
yushulx
  • 11,695
  • 8
  • 37
  • 64