0

I have some .pkl files inside a folder (say data_pkl) in a package(pip installable) and I want to load these files in some .py outside package. However, I'm unsure on following

  1. should I write MANIFEST.in file
  2. are there any changes that has to be made in the setup.py file
  3. do I need to put __init__.py inside the data_pkl folder
  4. how can I import the .pkl files inside a python script using the package.

EDIT: include_package_data=True in setup.py.

If set to True, this tells setuptools to automatically include any data files it finds inside your package directories, that are either under CVS or Subversion control, or which are specified by your MANIFEST.in file. This answers 1 and 2.

Lawhatre
  • 1,302
  • 2
  • 10
  • 28
  • 1
    To read "package data", use [_`importlib.resources`_](https://docs.python.org/3/library/importlib.html#module-importlib.resources) or [_`pkgutil`_](https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data). – sinoroc Jun 19 '21 at 15:27

1 Answers1

1

.pkl data are probably serialized data using pickle python module. It can't be imported. You have to deserialize data.

import pickle
data = pickle.load(open("data.pkl", "rb"))

As say in other answer, you can wraps this in a python module.

# filename: data.py
import pickle

def load_data(filename):
    return pickle.load(open(filename, "rb"))

If your .pkl files are in a python package, you can retreive its using pkg_resources.

import pickle
import pkg_resources

def load_data(resource_name):
    return pickle.load(
        pkg_resources.resource_stream("my_package", resource_name))

In python >= 3.7, data can be retreived using importlib.resources to prevent use of thrird-party package.

data = pickle.load(
    importlib.resources.open_binary("my_package.data_folder", "data.pkl"))
Balaïtous
  • 826
  • 6
  • 9
  • 1
    No need to rely on `pkg_resources` which is part of a 3rd party library (_setuptools_), when the same functionality is available in the standard library, in `importlib.resources` or `pkgutil`. – sinoroc Jun 19 '21 at 15:29
  • That's right, but only for python >= 3.7. – Balaïtous Jun 19 '21 at 17:27
  • 1
    [_`pkgutil.get_data()`_](https://docs.python.org/3/library/pkgutil.html#pkgutil.get_data) seems compatible with everything. See: https://github.com/wimglenn/resources-example and https://stackoverflow.com/a/58941536 – sinoroc Jun 19 '21 at 18:04
  • @sinoroc second resource was also very useful to me. – Lawhatre Jun 20 '21 at 02:45