0

I am using jupyter notebook. The following code uses some functions defined in the metpy package (dewpoint_from_relative_humidity) to define a new function "calc".

import numpy as np
import xarray as xr
from time import time
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from metpy.units import units
from metpy.calc import cape_cin, dewpoint_from_relative_humidity, parcel_profile, most_unstable_cape_cin, mixed_layer_cape_cin
from metpy.calc import dewpoint_from_relative_humidity
from multiprocessing import Pool
from pytictoc import TicToc # conda install pytictoc -c ecf


data1  = xr.open_dataset(r'new.nc')
data = data1.sel(latitude = slice (50,45), longitude = slice(-110,-108))
data.r.values = data.r.values/100
data.t.values = data.t.values-273
data.t.attrs['units'] = 'degree_Celsius'
data.r.attrs['units'] = 'dimensionless'
data.level.attrs['units'] = 'hectopascal'

## reversing pressure levels descending
data = data.isel(level=slice(None, None, -1))
p = data.level

#Defining the calc function using the metpy functions: dewpointy_f_R_H and Most_unstabe_cape_cin
def calc(idata):
    p1 = idata[0] 
    t1 = idata[1]
    rh1 = idata[2]
    
    td1 = dewpoint_from_relative_humidity(t1,rh1)

    cape = most_unstable_cape_cin(p1, t1, td1)
    #print(cape)
    return cape

SP_emp = data.drop_vars(['t'])
SP_emp['cape'] = SP_emp['r']*0
SP_emp = SP_emp.drop_vars(['r'])

#Dropping 'level' dimension
SP_emp = SP_emp.mean(['level'])


from multiprocess import Pool
from pytictoc import TicToc

if __name__ ==  '__main__':
     pool = Pool(processes = 8)
    #num_processors = 4

mucape_res = np.zeros((data.t.time.size, data.t.latitude.size * data.t.longitude.size)) # time * lat * lon
print(mucape_res.shape)
    
for lat in data.latitude.values:
        for lon in data.longitude.values:
            for tim in data.time.values:
                t = TicToc()
                t.tic()
                Temp = data.t.sel(time =tim, latitude=lat, longitude = lon)
                #print(Temp)
                RH = data.r.sel(time =tim, latitude=lat, longitude = lon)
                #print(RH)
                #print(TD)
                sets = p,Temp,RH
                out = pool.map(calc,sets)
                #out = calc(set)
                cape_mag = out.magnitude
                SP_emp.cape.loc[dict(time = tim,longitude = lon, latitude = lat)] = cape_mag
                t.toc()
                #pool.close()

But when this defined function is used further in the loop it gives the following error:

NameError: name 'dewpoint_from_relative_humidity' is not defined

Why is the error coming if the function has already been defined previously in 'calc', is there any problem with the pool or the way a native function is defined inside a new function?

piyush
  • 19
  • 3
  • Does your comment "in a previous cell" mean that you are working in a jupyter notebook? If so, does it work to put everything in a single cell, or to not use a notebook (i.e. just a text file)? Each cell in a notebook is a different namespace, so it doesn't exactly correspond to the global namespace -- and thus when you are referencing other defined objects, you have to make sure that the objects are in the global dict. – Mike McKerns Mar 27 '23 at 11:47
  • I should've mentioned this, yes I am working in a jupyter notebook. Running the whole code in single cell also doesn't fix the issue. I will look more into the namespace related issues. Thank you for the direction. – piyush Mar 27 '23 at 12:11
  • It's probably the multiprocessing aspect. I think the easiest way to directly test is to add the import statements inside the function `calc()` as well and and see if that fixes what title issue. But you might have bigger fish to fry here as I've seen in several places that `multiprocessing.Pool` & Jupyter don't mix, see [here](https://stackoverflow.com/a/75824360/8508004). Should you be using `multiprocessing.ThreadPool`? See that reference, and [here](https://stackoverflow.com/a/54252710/8508004), and [here](https://jupyter-tutorial.readthedocs.io/en/stable/performance/multiprocessing.html). – Wayne Mar 27 '23 at 15:37
  • A few more questions: Are you on windows? `Pool` on windows is different than on a MacOS or Linux. I see you are importing `Pool` from both `multiprocess` and `multiprocessing` -- they are different packages. The second is part of the standard library, while the first (I'm the author) is a serialization-enhanced fork of `multiprocessing`. Do you know which package's pools you are using? If you are using `multiprocess`, you can alter the interaction with the global namespace through `dill.settings['recurse']`. Another check to see if it's a namespace issue is to use `multiprocess.dummy.Pool`. – Mike McKerns Mar 27 '23 at 15:56
  • @MikeMcKerns You should post that as an answer, because I'm pretty sure that's the issue. – DopplerShift Mar 27 '23 at 19:29
  • If serialization is the issue due to namespacing problems, then @Wayne has given a potentially straightforward workaround, which is to ensure the function `calc` is self-contained (i.e. include the imports and other dependencies inside the function definition, etc). – Mike McKerns Mar 28 '23 at 00:34

0 Answers0