So for some reason when I'm trying to use parallel_apply()
it would give NameError to the function, even though the function has been declared. Even when setting the axis parameter it says it's not supposed to be there. If I use the normal apply()
, there would be no issue. The thing is, if I use the Jupyter Notebook on the server machine, it's working.
df_transport_hourly = df_db_transport.groupby(['siteId', 'port', 'systemName', df_db_transport['startedAt'].dt.date, df_db_transport['startedAt'].dt.hour]).parallel_apply(lambda x: calc_transport(x, 'h'))
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "c:\Users\Hush\anaconda3\envs\py10\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "c:\Users\Hush\anaconda3\envs\py10\lib\multiprocessing\pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\core.py", line 158, in __call__
results = self.work_function(
File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 40, in work
return [compute_result(key, df) for key, df in data]
File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 40, in
return [compute_result(key, df) for key, df in data]
File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 34, in compute_result
result = user_defined_function(
File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\progress_bars.py", line 214, in closure
return user_defined_function(
File "C:\Users\Hush\AppData\Local\Temp\ipykernel_24340\379370115.py", line 1, in
NameError: name 'calc_transport' is not defined
"""
The above exception was the direct cause of the following exception:
...
772 return self._value
773 else:
--> 774 raise self._value
NameError: name 'calc_transport' is not defined
see here that the function is actually declared
Also, when testing other functions that parallel_apply()
managed to run, if the library imports are outside of the function for some libraries like datetime
or bson
it would show an error and would only work if they're imported inside the function. They would also be unable to access global variables.
NOTE that this is not an issue when running on the server's Jupyter. Tried changing the python version to match the one running on the server, but still failed.
I also tried 2 different python versions to see if that was the issue, nothing worked, they all gave the same error.