Python program got killed after around the 400th loop

Question

My python program got killed after around 400 loops, it didn't finish my list. The program is for regridding rainfall data from ESRI ascii and export as netCDF.

My system is ubuntu 18.xx, and python is version 3.7. I originally ran on jupyter lab, and I thought it might be the constrain of the jupyter lab. Then, I ran as script, but it is the same.

# Libraries
import numpy as np
import pandas as pd
import xarray as xr
import xesmf as xe
import os

# Funtion to read ESRI ASCII
def read_grd(filename):
    with open(filename) as infile:
        ncols = int(infile.readline().split()[1])
        nrows = int(infile.readline().split()[1])
        xllcorner = float(infile.readline().split()[1])
        yllcorner = float(infile.readline().split()[1])
        cellsize = float(infile.readline().split()[1])
        nodata_value = int(infile.readline().split()[1])
    lon = xllcorner + cellsize * np.arange(ncols)
    lat = yllcorner + cellsize * np.arange(nrows)
    value = np.loadtxt(filename, skiprows=6)

    return lon, lat, value


### Static data setting ===============================

# Input variables (date)
dir_rainfall = "./HI_1994_1999/"
dir_out = "./HI_1994_1999_regridded/"
arr = sorted(os.listdir(dir_rainfall)) #len(arr) = 50404

# Read the spatial data that we're going to regridde
ds_sm = xr.open_dataset('./Spatial_Metadata.nc',autoclose=True)

ds_out = xr.open_dataset('./regrid_frame.nc')
ds_out.rename({'XLONG_M': 'lon', 'XLAT_M': 'lat'}, inplace=True)
ds_out['lat'] = ds_out['lat'].sel(Time=0, drop=True)
ds_out['lon'] = ds_out['lon'].sel(Time=0, drop=True)

### LOOP ==============================================
i = 1690  # It kept stopped every ~400 iteration, so I added this to restart from the last killed.
arr = arr[i:len(arr)]

for var in arr:

    # get dateTime
    dateTime = var.replace('hawaii_',"")
    dateTime = dateTime.replace("z","")
    print("Regridding now " + str(i) + " : " + dateTime)

    # Read rainfall ASCII and calculate rain rate
    asc = read_grd(dir_rainfall + var)
    precip_rate = np.array(asc[2] / (60.0*60.0))
    precip_rate = precip_rate.astype('float')
    precip_rate[precip_rate == -9999.] = np.nan
    x = np.repeat([asc[0]], 1, axis=0).transpose()
    y = np.repeat([asc[1]], 1, axis=0)

    # Format rain rate to netCDF
    ds = xr.Dataset({'RAINRATE': (['lat', 'lon'],  precip_rate)},
                    coords={'lon': (['lon'], asc[0]),
                            'lat': (['lat'], asc[1])})

    # Regrid
    regridder = xe.Regridder(ds, ds_out, 'bilinear', reuse_weights=True) # create regriding frame

    dr = ds.RAINRATE
    dr_out = regridder(dr)

    # change the name of coordinates for the regridded netCDF
    dr_out.coords['west_east'] = ds_sm.x.values
    dr_out.coords['south_north'] = ds_sm.y.values
    dr_out = dr_out.rename({'west_east': 'x', 'south_north': 'y'})

    # add attributes to the regridded netCDF
    dr_out.attrs['esri_pe_string'] = ds_sm.crs.attrs['esri_pe_string']
    dr_out.attrs['units'] = 'mm/s'


    # Export the YYYYMMDDHHMMPRECIP_FORCING.nc
    dr_out.to_netcdf(dir_out + dateTime + '00.PRECIP_FORCING.nc')

    i = i + 1

The first ~400 ESRI ASCIIs were successfully converted. Then it was just stuck, while I ran it in jupyter lab; if I rain from a script, python xxx.py, after ~400 run, it returned me "Killed" from my terminal.

=== [edit 7/11/2019] Add memory usage information ===

According @zmike, it might be memory problem, so I printed out the memory, and it is memory problem - the memory kept accumulated.

I added the code below to my code.

    # Print out total memory
    process = psutil.Process(os.getpid())
    print(process.memory_info().rss)

There's too much unnecessary information here. Please cut the code snippet down to only the parts that are important. — ryansle, Jul 09 '19 at 18:49
What ryansle said. Please do add the complete error message text to your question however. — Xukrao, Jul 09 '19 at 19:31
How big are the files and where are you getting them from? Maybe they're too big and/or too many (50404+ correct?) and you're running out of memory? — zmike, Jul 09 '19 at 19:34
Sorry for the long code @ryansle, but I don't know where to cut since there was no error message. Instead, it was just stuck or 'Killed'. — Yu-Fen, Jul 10 '19 at 02:51
@Xukrao, sorry, but there was no error message instead of just stuck or the program got killed. — Yu-Fen, Jul 10 '19 at 02:52
@Zmike, good points, each ESRI ASCII is about 6MB and they are in my second local drive. Each output file is about 3.6MB. Is there anyway to reduce the memory? I thought I'm just overwriting the variables instead using a lot of different variables, does it still accumulate the cost of memory? Thank you! — Yu-Fen, Jul 10 '19 at 02:58
Will you please provide the source of the data? That way I can trying reproducing the error. — zmike, Jul 10 '19 at 16:30
@zmike Thanks for willing to take a look the source of data, but the data too big to share. I added the code for printing out memory usage, and it is accumulating memory. I attached the figure of the end of result - no error message but 'Killed' — Yu-Fen, Jul 12 '19 at 00:58

score 1 · Answer 1 · answered Jul 12 '19 at 05:29

I'm answering my own question after searching and testing, hope it helps for people has the same questions:

The python program got killed because of running out of memory. There are few ways to reduce the memory, including

delete the variables and use gc.collect() (How can I release memory after creating matplotlib figures)
optimized the code (https://www.codementor.io/satwikkansal/python-practices-for-efficient-code-performance-memory-and-usability-aze6oiq65; https://dzone.com/articles/python-memory-issues-tips-and-tricks; https://www.reddit.com/r/Python/comments/4fcnfy/how_can_i_manage_memory_in_python_running_out_of/)

I can release some memory from the steps above, and it can go up to ~500 loops (up to ~2000 loops when I run on server rather than personal computer). The memory issue is still there, but it got better.

Python program got killed after around the 400th loop

=== [edit 7/11/2019] Add memory usage information ===

1 Answers1