0

So for some reason when I'm trying to use parallel_apply() it would give NameError to the function, even though the function has been declared. Even when setting the axis parameter it says it's not supposed to be there. If I use the normal apply(), there would be no issue. The thing is, if I use the Jupyter Notebook on the server machine, it's working.

df_transport_hourly = df_db_transport.groupby(['siteId', 'port', 'systemName', df_db_transport['startedAt'].dt.date, df_db_transport['startedAt'].dt.hour]).parallel_apply(lambda x: calc_transport(x, 'h'))
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "c:\Users\Hush\anaconda3\envs\py10\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\Users\Hush\anaconda3\envs\py10\lib\multiprocessing\pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\core.py", line 158, in __call__
    results = self.work_function(
  File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 40, in work
    return [compute_result(key, df) for key, df in data]
  File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 40, in 
    return [compute_result(key, df) for key, df in data]
  File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\data_types\dataframe_groupby.py", line 34, in compute_result
    result = user_defined_function(
  File "c:\Users\Hush\anaconda3\envs\py10\lib\site-packages\pandarallel\progress_bars.py", line 214, in closure
    return user_defined_function(
  File "C:\Users\Hush\AppData\Local\Temp\ipykernel_24340\379370115.py", line 1, in 
NameError: name 'calc_transport' is not defined
"""

The above exception was the direct cause of the following exception:
...
    772     return self._value
    773 else:
--> 774     raise self._value

NameError: name 'calc_transport' is not defined

see here that the function is actually declared

Also, when testing other functions that parallel_apply() managed to run, if the library imports are outside of the function for some libraries like datetime or bson it would show an error and would only work if they're imported inside the function. They would also be unable to access global variables.

NOTE that this is not an issue when running on the server's Jupyter. Tried changing the python version to match the one running on the server, but still failed.

I also tried 2 different python versions to see if that was the issue, nothing worked, they all gave the same error.

huhehu
  • 1
  • 1
  • Where is ```calc_transport``` from? – ewokx Apr 13 '23 at 10:01
  • @ewokx it's a function i made, also 1 thing i noticed was that when i managed to get a function that it would run, it could not have 2 or more arguments. So basically `parallel_apply(function)` without the lambda – huhehu Apr 13 '23 at 11:11

0 Answers0