Context
Contents of the project root directory (project-name
):
.
├── data
│ ├── locations
│ │ ├── location_group_a
│ │ ├── location_group_b
│ │ ├── location_group_c
│ │ ├── location_group_d
│ │ └── pickles
│ └── reviews
│ ├── location_group_a
│ ├── location_group_b
│ ├── location_group_c
│ ├── location_group_d
│ └── pickles
├── project_name <- Contents of this subdirectory are shown below.
├── logs
├── secrets
└── tests
Contents of the project_name
subdirectory:
.
├── __init__.py
├── common_utils.py
├── dash_app.py
├── data_download.py
├── data_prep.py
├── data_store.py
├── google_api.py
├── nlp.py
└── scratchpad.ipynb
Contents of the first cell in scratchpad.ipynb
notebook:
import data_prep
df = data_prep.get_reviews_dataframe()
print(sys.path)
Contents of sys.path
(print in scratchpad.ipynb
notebook):
['/Users/my-username/Code/project-name/project_name',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python39.zip',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9',
'/usr/local/Cellar/python@3.9/3.9.12/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload',
'',
'/Users/my-username/Code/project-name/.venv/lib/python3.9/site-packages',
'/Users/my-username/Code/my-project']
(Spaces have been replaced with newlines for improved readability.)
I'm using Poetry to manage my Python project, and I'm running Jupyter via Visual Studio Code.
Problem
The function, data_prep.get_reviews_dataframe()
uses files from my-project/data
.
Everything works as expected when I import and call that function in a "normal" Python script (my-project/my_project/nlp.py
). Meaning the files within my-project/data
are recognized.
However, when I import and call that same function from within a Jupyter notebook (my-project/my_project/scratchpad.ipynb
), functions called by data_prep.get_reviews_dataframe()
cannot "see" the files within the directory my-project/data
, which causes subsequent function calls to fail.
Here is the stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Users/my-username/Code/my-project/my_project/scratchpad.ipynb Cell 2' in <cell line: 1>()
----> 1 df = data_prep.get_reviews_dataframe()
File ~/Code/my-project/my_project/data_prep.py:8, in get_reviews_dataframe()
6 def get_reviews_dataframe():
7 '''Get a tidy data frame with correct data types containing all reviews.'''
----> 8 df = data_store.load_pickle(json_directory="data/reviews",
9 pickle_directory="data/reviews/pickles",
10 pickle_filename="reviews.pkl")
11 df = split_review_comment(df, "comment")
12 df = extract_ids_from_field(df, "name")
File ~/Code/my-project/my_project/data_store.py:55, in load_pickle(json_directory, pickle_directory, pickle_filename, update_pickle, overwrite_pickle)
53 latest_pickle_filename = get_latest_pickle_filename(pickle_directory)
54 if latest_pickle_filename == None or update_pickle == True:
---> 55 filename = json_to_pickle(
56 json_path=json_directory,
57 pickle_path=pickle_directory,
58 pickle_filename=pickle_filename,
59 force_update=overwrite_pickle)
60 else:
61 filename = f"{pickle_directory}/{latest_pickle_filename}"
File ~/Code/my-project/my_project/data_store.py:107, in json_to_pickle(json_path, pickle_path, pickle_filename, force_update)
105 with open(path, "r") as file:
106 data_frames.append(pd.json_normalize(json.load(file)))
--> 107 pd.concat(data_frames).to_pickle(out_path, compression="infer")
108 else:
109 pprint(f"Using existing pickle file ({out_path})")
File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/util/_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:347, in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
143 @deprecate_nonkeyword_arguments(version=None, allowed_args=["objs"])
144 def concat(
145 objs: Iterable[NDFrame] | Mapping[Hashable, NDFrame],
(...)
154 copy: bool = True,
155 ) -> DataFrame | Series:
156 """
157 Concatenate pandas objects along a particular axis with optional set logic
158 along the other axes.
(...)
345 ValueError: Indexes have overlapping values: ['a']
346 """
--> 347 op = _Concatenator(
348 objs,
349 axis=axis,
350 ignore_index=ignore_index,
351 join=join,
352 keys=keys,
353 levels=levels,
354 names=names,
355 verify_integrity=verify_integrity,
356 copy=copy,
357 sort=sort,
358 )
360 return op.get_result()
File ~/Code/my-project/.venv/lib/python3.9/site-packages/pandas/core/reshape/concat.py:404, in _Concatenator.__init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
401 objs = list(objs)
403 if len(objs) == 0:
--> 404 raise ValueError("No objects to concatenate")
406 if keys is None:
407 objs = list(com.not_none(*objs))
ValueError: No objects to concatenate
Following the advice in this comment, I have tried to add the parent directory to sys.path
:
sys.path.insert(0, "..")
sys.path.insert(0, "../data")
%load_ext autoreload
%autoreload 2
I have also tried using sys.path.append
in the same way.
My path is indeed modified after calling those functions, but the same problem persists.
Question
How can I make Jupyter "see" the files found within my-project/data
?
sys.path.append
and sys.path.insert
don't seem to work.