7

I am trying to convert a .csv file to a netCDF4 via Python but I am having trouble figuring out how I can store information from a .csv table format into a netCDF. My main concern is how do we declare the variables from the columns into a workable netCDF4 format? Everything I have found is normally extracting information from a netCDF4 to a .csv or ASCII. I have provided the sample data, sample code, and my errors for declaring the appropriate arrays. Any help would be much appreciated.

The sample table is below:

Station Name    Country  Code   Lat Lon mn.yr   temp1   temp2   temp3   hpa 
Somewhere   US  12340   35.52   23.358  1.19    -8.3    -13.1   -5  69.5
Somewhere   US  12340           2.1971  -10.7   -13.9   -7.9    27.9
Somewhere   US  12340           3.1971  -8.4    -13 -4.3    90.8

My sample code is:

#!/usr/bin/env python

import scipy
import numpy
import netCDF4
import csv

from numpy import arange, dtype 

#Declare empty arrays

v1 = []
v2 = []
v3 = []
v4 = []

# Open csv file and declare variable for arrays for each heading

f = open('station_data.csv', 'r').readlines()

for line in f[1:]:
    fields = line.split(',')
    v1.append(fields[0]) #station
    v2.append(fields[1])#country
    v3.append(int(fields[2]))#code
    v4.append(float(fields[3]))#lat
    v5.append(float(fields[3]))#lon
#more variables included but this is just an abridged list
print v1
print v2
print v3
print v4

#convert to netcdf4 framework that works as a netcdf

ncout = netCDF4.Dataset('station_data.nc','w') 

# latitudes and longitudes. Include NaN for missing numbers

lats_out = -25.0 + 5.0*arange(v4,dtype='float32')
lons_out = -125.0 + 5.0*arange(v5,dtype='float32')

# output data.

press_out = 900. + arange(v4*v5,dtype='float32') # 1d array
press_out.shape = (v4,v5) # reshape to 2d array
temp_out = 9. + 0.25*arange(v4*v5,dtype='float32') # 1d array
temp_out.shape = (v4,v5) # reshape to 2d array

# create the lat and lon dimensions.

ncout.createDimension('latitude',v4)
ncout.createDimension('longitude',v5)

# Define the coordinate variables. They will hold the coordinate information

lats = ncout.createVariable('latitude',dtype('float32').char,('latitude',))
lons = ncout.createVariable('longitude',dtype('float32').char,('longitude',))

# Assign units attributes to coordinate var data. This attaches a text attribute to each of the coordinate variables, containing the units.

lats.units = 'degrees_north'
lons.units = 'degrees_east'

# write data to coordinate vars.

lats[:] = lats_out
lons[:] = lons_out

# create the pressure and temperature variables

press = ncout.createVariable('pressure',dtype('float32').char,('latitude','longitude'))
temp = ncout.createVariable('temperature',dtype('float32').char,'latitude','longitude'))

# set the units attribute.

press.units =  'hPa'
temp.units = 'celsius'

# write data to variables.

press[:] = press_out
temp[:] = temp_out

ncout.close()
f.close()

error:

Traceback (most recent call last):
  File "station_data.py", line 33, in <module>
    v4.append(float(fields[3]))#lat
ValueError: could not convert string to float: 
Bruno Gelb
  • 5,322
  • 8
  • 35
  • 50
user3275006
  • 97
  • 1
  • 2
  • 7
  • The error says that value in `fields[3]` is not a number, hence it cannot be converted to float. Check your input file for this value. You can also try printing the value of `fields[3]` before converting it to float and adding to list `v4` – vaibhaw Apr 08 '14 at 11:20
  • Thank you very much for clarifying that. You are correct, by just printing it it worked but I wasn't confident that it would transfer well when going into a netcdf. These are latitudes so by assigning them any data type, is that okay when transferring over to netcdf? – user3275006 Apr 08 '14 at 14:23

3 Answers3

12

This is a perfect job for xarray, a python package that has a dataset object representing the netcdf common data model. Here's an example you can try:

import pandas as pd
import xarray as xr

url = 'http://www.cpc.ncep.noaa.gov/products/precip/CWlink/'

ao_file = url + 'daily_ao_index/monthly.ao.index.b50.current.ascii'
nao_file = url + 'pna/norm.nao.monthly.b5001.current.ascii'

kw = dict(sep='\s*', parse_dates={'dates': [0, 1]},
          header=None, index_col=0, squeeze=True, engine='python')

# read into Pandas Series
s1 = pd.read_csv(ao_file, **kw)
s2 = pd.read_csv(nao_file, **kw)

s1.name='AO'
s2.name='NAO'

# concatenate two Pandas Series into a Pandas DataFrame
df=pd.concat([s1, s2], axis=1)

# create xarray Dataset from Pandas DataFrame
xds = xr.Dataset.from_dataframe(df)

# add variable attribute metadata
xds['AO'].attrs={'units':'1', 'long_name':'Arctic Oscillation'}
xds['NAO'].attrs={'units':'1', 'long_name':'North Atlantic Oscillation'}

# add global attribute metadata
xds.attrs={'Conventions':'CF-1.0', 'title':'AO and NAO', 'summary':'Arctic and North Atlantic Oscillation Indices'}

# save to netCDF
xds.to_netcdf('/usgs/data2/notebook/data/ao_and_nao.nc')

Then running ncdump -h ao_and_nao.nc produces:

netcdf ao_and_nao {
dimensions:
        dates = 782 ;
variables:
        double dates(dates) ;
                dates:units = "days since 1950-01-06 00:00:00" ;
                dates:calendar = "proleptic_gregorian" ;
        double NAO(dates) ;
                NAO:units = "1" ;
                NAO:long_name = "North Atlantic Oscillation" ;
        double AO(dates) ;
                AO:units = "1" ;
                AO:long_name = "Arctic Oscillation" ;

// global attributes:
                :title = "AO and NAO" ;
                :summary = "Arctic and North Atlantic Oscillation Indices" ;
                :Conventions = "CF-1.0" ;

Note that you can install xarray using pip, but if you are using the Anaconda Python Distribution, you can install it from the Anaconda.org/conda-forge channel by using:

conda install -c conda-forge xarray
Eric Bridger
  • 3,751
  • 1
  • 19
  • 34
Rich Signell
  • 14,842
  • 4
  • 49
  • 77
  • pandas.errors.ParserError: Expected 21 fields in line 10, saw 22. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. – otocan Feb 18 '22 at 11:52
5

While xarray mentioned above is a great tool, it is also worth looking at the UK Met Office's iris library. A key advantage of Iris is that helps to create netCDF files that follow the Climate Forecast (CF-conventions). It does this by providing helper functions to define standard names, units, coordinate systems, and other metadata conventions. It also provides plotting, subsetting, and analysis utilities.

For earth science data such as this, CF is the recommended standard for netCDF files

As an example of its use, this Python notebook re-implements the AO/NAO example above.

David LeBauer
  • 31,011
  • 31
  • 115
  • 189
ocefpaf
  • 569
  • 4
  • 15
  • `iris` can be tough to install, but if you are using Anaconda, you can install it using `conda install -c conda-forge iris` by virtue of the Anaconda.org/conda-forge channel – Rich Signell Jan 27 '17 at 17:04
0

If you see your input file, there is no value corresponding to column Lat in second row. When you read the csv file this value i.e. fields[3] is stored as an empty string "". That's why you are getting a ValueError. Instead of using the default function you can define a new function which can handle this error:

def str_to_float(str):
    try:
        number = float(str)
    except ValueError:
        number = 0.0
# you can assign an appropriate value instead of 0.0 which suits your requirement
    return number

Now you can use this function in place of built-in float function this way:

v4.append(str_to_float(fields[3]))
vaibhaw
  • 151
  • 1
  • 2
  • 14
  • Have a look at [this](http://stackoverflow.com/questions/379906/parse-string-to-float-or-int) SO question which gives more insight into string to int or float conversion. – vaibhaw Apr 08 '14 at 19:22
  • Thank you very much for your thorough explanation. I didn't realize it was storing this as an empty string. This new method makes sense and works fantastic. – user3275006 Apr 09 '14 at 06:49
  • Would it be possible to inquire about possible solutions for part 2 of this problem? Are there any resources that can provide a clearer explanation of how to import the declared variables from the above .csv file into a netCDF4 file? There doesn't seem to be a great deal of information of the conversion from .csv to netcdf. I have been replacing the press_out, temp_out, lats, and lons with the v (1,2,3...etc) variables but it is not registering the information I am trying to transfer into a netcdf4 format. Would you be able to provide any additional assistance? – user3275006 Apr 09 '14 at 11:21
  • I am sorry. I am not familiar with netCDF4. I think you should treat this problem as 2 sub problems: 1. Read data from csv, store it in variables.(You have already done it) 2. Use this data stored in variables to feed them to netCDF variables. I think you should check the [documentation](http://unidata.github.io/netcdf4-python/netCDF4-module.html) for the second part. In case you get any errors while doing so, post the errors so that cause of that error can be identified. – vaibhaw Apr 09 '14 at 12:01