0

I have a working conda environment with Kedro installed. The .yml file is available by the link 1. My kedro pipelines work fine in this environment. However, when I try to install matplotlib package with conda I have the following warning:

The following packages are causing the inconsistency:

  • conda-forge/noarch::pyspark==3.2.1=pyhd8ed1ab_0

Conda somehow resolves it I suppose because it suggests installing the required packages. When I try to run kedro in this updated environment I face the error:

kedro.io.core.DataSetError: Object ParquetDataSet cannot be loaded from kedro.extras.datasets.pandas.

It seems like the problem is that conda updates some packages which are no longer consistible with kedro. How can I install the matplolib pkg using conda without breaking kedro?

Ildar
  • 33
  • 6
  • Did you try to change the `requirements.txt` file (in `project_folder/src/`) and run `kedro install --build-reqs`? – Arnaldo Gualberto Feb 07 '22 at 12:28
  • I don't think anything broke, just that you are now using part of the code in the extras that requires additional packages: see https://stackoverflow.com/a/70725359/570918. That is, `ParquetDataSet` does not work unless you co-install either `pyarrow` or `fastparquet` - they are optional dependencies of Pandas. – merv Feb 07 '22 at 16:07

0 Answers0