0

I'm attempting to convert a netcdf (*.nc) (classic format) to CSV. This file is taken from the NOAA precipitation data set.

I found helpful code in this post; however, when I run it I get this exception:

Traceback (most recent call last): File "./test2.py", line 31, in precip_ts = pd.Series(precip, index=dtime) File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 275, in init raise_cast_failure=True) File "/usr/local/lib/python2.7/site-packages/pandas/core/series.py", line 4165, in _sanitize_array raise Exception('Data must be 1-dimensional') Exception: Data must be 1-dimensional

Here is the test2.py script (same as post referenced above):

#!/usr/local/bin/python2.7

import netCDF4
import pandas as pd

precip_nc_file = 'precip.V1.0.2006.nc'
nc = netCDF4.Dataset(precip_nc_file, mode='r')
nc.variables.keys()

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
precip = nc.variables['precip'][:]

# a pandas.Series designed for time series of a 2D lat,lon grid
precip_ts = pd.Series(precip, index=dtime)

precip_ts.to_csv('precip.csv',index=True, header=True)

It is failing at the Pandas Series call. Can you give me any guidance why the pandas is failing; I thought it is supposed to handle 2D data!

The end result I'm looking for is a CSV file with each line having lon, lat, datetime, precip value

1 Answers1

0

Here, pd.Series() seems to expect a 1-D object. Whereas the entire masked arrays are more than 1-D. So to access the array part of interest, it could be added by a '.data' etc. The code below shows how to save precip_ts to csv. I do not know enough about the how the '.nc' file is structure that I downloaded from here (precip.V1.0.2006.nc). Because there are unequal number of elements in the resulting series. Therefore it is difficult to know which values would be in the same row with others. For example: lat has 120 values wherease lon has 300 values. On the other hand, if all the arrays were of the same length, then they could be combined into a single pandas dataframe and then saved as a csv file (code the very bottom).

Import libraries

import netCDF4
import pandas as pd
import numpy.ma as ma

Code from the question repeated below

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
precip = nc.variables['precip'][:]

precip_nc_file = 'precip.V1.0.2006.nc'
nc = netCDF4.Dataset(precip_nc_file, mode='r')
nc.variables.keys()

Edit line below: Replace dtime with dtime.data to access masked array

precip_ts = pd.Series(dtime.data, index=dtime)

Save as .csv

precip_ts.to_csv('precip.csv',index=True, header=True)

enter image description here

Only if all values of different series have same length then the code below could be used to save all of them into a single dataframe. This did not work for me because the downloaded file created series of unequal lengths.

df = pd.DataFrame({
    'lat': lat,
    'lon': lon,
    'dtime': dtime,
    'precip': precip 
})
df.head(2)
Nilesh Ingle
  • 1,777
  • 11
  • 17