0

Context

Contents of the project root directory (project-name):

.
├── data
│   ├── locations
│   │   ├── location_group_a
│   │   ├── location_group_b
│   │   ├── location_group_c
│   │   ├── location_group_d
│   │   └── pickles
│   └── reviews
│       ├── location_group_a
│       ├── location_group_b
│       ├── location_group_c
│       ├── location_group_d
│       └── pickles
├── project_name <- Contents of this subdirectory are shown below.
├── logs
├── secrets
└── tests

Contents of the project_name subdirectory:

.
├── __init__.py
├── common_utils.py
├── dash_app.py
├── data_download.py
├── data_prep.py
├── data_store.py
├── google_api.py
├── nlp.py
└── scratchpad.ipynb

Contents of the first cell in scratchpad.ipynb notebook:

import data_prep

df = data_prep.get_reviews_dataframe()

print(sys.path)

Contents of sys.path (print in scratchpad.ipynb notebook):

['/Users/my-username/Code/project-name/project_name',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python39.zip',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload',
'',
'/Users/my-username/Code/project-name/.venv/lib/python3.9/site-packages',
'/Users/my-username/Code/my-project']

(Spaces have been replaced with newlines for improved readability.)

I'm using Poetry to manage my Python project, and I'm running Jupyter via Visual Studio Code.

Problem

The function, data_prep.get_reviews_dataframe() uses files from my-project/data.

Everything works as expected when I import and call that function in a "normal" Python script (my-project/my_project/nlp.py). Meaning the files within my-project/data are recognized.

However, when I import and call that same function from within a Jupyter notebook (my-project/my_project/scratchpad.ipynb), functions called by data_prep.get_reviews_dataframe() cannot "see" the files within the directory my-project/data, which causes subsequent function calls to fail.

Here is the stack trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/Users/my-username/Code/my-project/my_project/scratchpad.ipynb Cell 2' in <cell line: 1>()
----> 1 df = data_prep.get_reviews_dataframe()

File ~/Code/my-project/my_project/data_prep.py:8, in get_reviews_dataframe()
      6 def get_reviews_dataframe():
      7     '''Get a tidy data frame with correct data types containing all reviews.'''
----> 8     df = data_store.load_pickle(json_directory="data/reviews",
      9                                 pickle_directory="data/reviews/pickles",
     10                                 pickle_filename="reviews.pkl")
     11     df = split_review_comment(df, "comment")
     12     df = extract_ids_from_field(df, "name")

File ~/Code/my-project/my_project/data_store.py:55, in load_pickle(json_directory, pickle_directory, pickle_filename, update_pickle, overwrite_pickle)
     53 latest_pickle_filename = get_latest_pickle_filename(pickle_directory)
     54 if latest_pickle_filename == None or update_pickle == True:
---> 55     filename = json_to_pickle(
     56         json_path=json_directory,
     57         pickle_path=pickle_directory,
     58         pickle_filename=pickle_filename,
     59         force_update=overwrite_pickle)
     60 else:
     61     filename = f"{pickle_directory}/{latest_pickle_filename}"

File ~/Code/my-project/my_project/data_store.py:107, in json_to_pickle(json_path, pickle_path, pickle_filename, force_update)
    105         with open(path, "r") as file:
    106             data_frames.append(pd.json_normalize(json.load(file)))
--> 107     pd.concat(data_frames).to_pickle(out_path, compression="infer")
    108 else:
    109     pprint(f"Using existing pickle file ({out_path})")

File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)

File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:347, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    143 @deprecate_nonkeyword_arguments(version=None, allowed_args=["objs"])
    144 def concat(
    145     objs: Iterable[NDFrame] | Mapping[Hashable, NDFrame],
   (...)
    154     copy: bool = True,
    155 ) -> DataFrame | Series:
    156     """
    157     Concatenate pandas objects along a particular axis with optional set logic
    158     along the other axes.
   (...)
    345     ValueError: Indexes have overlapping values: ['a']
    346     """
--> 347     op = _Concatenator(
    348         objs,
    349         axis=axis,
    350         ignore_index=ignore_index,
    351         join=join,
    352         keys=keys,
    353         levels=levels,
    354         names=names,
    355         verify_integrity=verify_integrity,
    356         copy=copy,
    357         sort=sort,
    358     )
    360     return op.get_result()

File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:404, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    401     objs = list(objs)
    403 if len(objs) == 0:
--> 404     raise ValueError("No objects to concatenate")
    406 if keys is None:
    407     objs = list(com.not_none(*objs))

ValueError: No objects to concatenate

Following the advice in this comment, I have tried to add the parent directory to sys.path:

sys.path.insert(0, "..")
sys.path.insert(0, "../data")

%load_ext autoreload
%autoreload 2

I have also tried using sys.path.append in the same way.

My path is indeed modified after calling those functions, but the same problem persists.

Question

How can I make Jupyter "see" the files found within my-project/data?

sys.path.append and sys.path.insert don't seem to work.

leifericf
  • 2,324
  • 3
  • 26
  • 37
  • Here are three suggestions for you: You can use the `cd` command to enter the jupyter directory where you need to work. Did you try adding a snippet to import the path: `import os import sys module_path = os.path.abspath(os.path.join('..')) if module_path not in sys.path: sys.path.append(module_path)` Also, have you imported the corresponding content in `__init__.py` – JialeDu May 26 '22 at 07:04
  • You can try to create a new notebook and do it again. – JialeDu May 26 '22 at 08:40

0 Answers0