4

My aim is to access data from a netcdf file and write to a CSV file in the following format.

Latitude  Longitude Date1  Date2  Date3
100       200       <-- MIN_SFC values -->

So far I have accessed the variables, written the header to the file and populated the lat/lons.

How can I access the MIN_SFC values for specified lon,lat coordinates and dates and then write to a CSV file.

I'm a python newbie if there is a better way to go about this please let me know.

NetCDF file info:

Dimensions:
  time = 7 
  latitude = 292
  longitude =341

Variables:
  float MIN_SFC (time=7, latitude = 292, longitude = 341)

Here's what I've tried:

 from netCDF4 import Dataset, num2date

 filename = "C:/filename.nc"

 nc = Dataset(filename, 'r', Format='NETCDF4')
 print nc.variables

 print 'Variable List'

 for var in nc.variables:
    print var, var.units, var.shape

 # get coordinates variables
 lats = nc.variables['latitude'][:]
 lons = nc.variables['longitude'][:]

 sfc= nc.variables['Min_SFC'][:]
 times = nc.variables['time'][:]

 # convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

 #print [dates.strftime('%Y%m%d%H') for date in dates]

 header = ['Latitude', 'Longitude']

 # append dates to header string

 for d in dates:
    print d
    header.append(d)

 # write to file
 import csv

 with open('Output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    outputwriter.writerow(header)
    for lat, lon in zip(lats, lons):
      outputwriter.writerow( [lat, lon] )
 
 # close the output file
 csvFile.close()

 # close netcdf
 nc.close()

UPDATE:

I've updated the code that writes the CSV file, there's an attribute error, because the lat/lon are doubles.

AttributeError: 'numpy.float32' object has no attribute 'append'

Any way to cast to a string in python? Do you think it'll work?

I've noticed a number of values returned as "--" when I printed values to the console. I'm wondering if this represents the fillValue or missingValue defined as -32767.0.

I'm also wondering whether the variables of the 3d dataset should be accessed by lats = nc.variables['latitude'][:][:] or lats = nc.variables['latitude'][:][:,:] ?

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
         t = num2date(time, units = units, calendar='365_day')
         header.append(t)
    outputwriter.writerow(header)  
    for lat_index, lat in enumerate(lats):
        content = lat
        print lat_index
        for lon_index, lon in enumerate(lons):
            content.append(lon)
            print lon_index    
            for time_index, time in enumerate(times): # for a date
                # pull out the data 
                data = sfc[time_index,lat_index,lon_index]
                content.append(data)
                outputwriter.writerow(content)
Jules0080
  • 41
  • 1
  • 2
  • 5
  • Why do you need it as CSV? Since Dataset stores data as Numpy arrays, you're probably best off using the built-in `numpy.savetxt` function for writing to text files documented [here](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html). – Spencer Hill Feb 10 '15 at 13:51
  • Are you wanting to search the lat/lon arrays for a given point and find the corresponding values for Min_SFC? – DopplerShift Feb 10 '15 at 14:38
  • DopplerShift I am wanting to iterate through the lat/lons for a date and write the Min_SFC like the example table provided in the post. I don't want to search or find a specific lat/lon or date my main issue is making sure all the data written to the file is for the same record. – Jules0080 Feb 11 '15 at 02:25
  • I looked into numpy.savetext and found an example where commas were inserted to create a CSV file. I'm not sure how to format according the example (table) I provided in the post, latitude, longitude, dates and sfc data in an output file. – Jules0080 Feb 11 '15 at 06:03
  • `numpy.savetxt` has `header` and `delimiter` kwargs. The former should enable you to specify the top line you want, and the latter should enable you to put in tabs as necessary to make the columns you want. The best way to access subsets of a Numpy array (i.e. your desired lat and lon ranges) is via Numpy's [slicing](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html). – Spencer Hill Feb 12 '15 at 16:40

4 Answers4

7

I would load the data into Pandas, which facilitates the analysis and plotting of time series data, as well as writing to CSV.

So here's a real working example which pulls a time series of wave heights from a specified lon,lat location out of a global forecast model dataset.

Note: here we access an OPeNDAP dataset so we can just extract the data we need from a remote server without downloading files. But netCDF4 works exactly the same for a remove OPeNDAP dataset or a local NetCDF file, which is a very useful feature!

import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

# NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file:
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/WW3/Global/Best'
nc = netCDF4.Dataset(url)
nc.variables.keys()

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)

# determine what longitude convention is being used [-180,180], [0,360]
print lon.min(),lon.max()

# specify some location to extract time series
lati = 41.4; loni = -67.8 +360.0  # Georges Bank

# find closest index to specified value
def near(array,value):
    idx=(abs(array-value)).argmin()
    return idx

# Find nearest point to desired location (could also interpolate, but more work)
ix = near(lon, loni)
iy = near(lat, lati)

# Extract desired times.      
# 1. Select -+some days around the current time:
start = dt.datetime.utcnow()- dt.timedelta(days=3)
stop = dt.datetime.utcnow()+ dt.timedelta(days=3)
#       OR
# 2. Specify the exact time period you want:
#start = dt.datetime(2013,6,2,0,0,0)
#stop = dt.datetime(2013,6,3,0,0,0)

istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop

# Get all time records of variable [vname] at indices [iy,ix]
vname = 'Significant_height_of_wind_waves_surface'
#vname = 'surf_el'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]

# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)

# Use Pandas time series plot method
ts.plot(figsize(12,4),
   title='Location: Lon=%.2f, Lat=%.2f' % ( lon[ix], lat[iy]),legend=True)
plt.ylabel(var.units);

#write to a CSV file
ts.to_csv('time_series_from_netcdf.csv')

which both creates this plot to verify that you've got the data you wanted: enter image description here

and also writes the desired CSV file time_series_from_netcdf.csv to disk.

You can also view, download and/or run this example on Wakari.

Rich Signell
  • 14,842
  • 4
  • 49
  • 77
  • oh, I see now that maybe I didn't read the question carefully enough. I thought a time series was desired at a specified location. Perhaps that's not the objective. – Rich Signell Feb 10 '15 at 15:03
  • I'm trying to write out the data to CSV without knowing any information other than the netcdf filename and variable names. – Jules0080 Feb 11 '15 at 06:02
2

Rich Signell's answer was incredibly helpful! Just as a note, it's important to also import datetime, and when extracting times, it's necessary to use the following code:

import datetime
import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

...

# 2. Specify the exact time period you want:
start = datetime.datetime(2005,1,1,0,0,0)
stop = datetime.datetime(2010,12,31,0,0,0)

I then looped over all the regions that I needed for my dataset.

Nathan Tuggy
  • 2,237
  • 27
  • 30
  • 38
aliki43
  • 161
  • 1
  • 2
  • 5
0

Not sure what you're still having trouble with, this looks good. I do see:

# convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

you now have the dates as python datetime objects

 #print [dates.strftime('%Y%m%d%H') for date in dates]

and this is what you need if you want them as strings -- but if you only want the day, remove the %H:

date_strings = [dates.strftime('%Y%m%d') for date in dates]

if you want the year, month day as numbers, datetime objects have attributes for that:

dt.year, dt.month, dt.day

As for your sfc variable -- is a 3-d array, so to get a particular value, you can do:

sfc[time_index, lat_index, lon_index]

being 3-d there are more than one way to write it to a csv file, but I'm guessing you might want something like:

for time_index, time in enumerate(time): # pull out the data for that time data = sfc[time_index, :, :] # write the date to the file (maybe) # .... Now loop through the "rows" for row in data: outputwriter.writerow( [str(val) for val in row] )

Or something like that....

Chris Barker
  • 587
  • 6
  • 3
  • I'm getting an Attribute error when taking away the time from the date field any idea why? I'm using Anaconda Spyder as my IDE. AttributeError: 'numpy.ndarray' object has no attribute 'strftime' date_strings = [dates.strftime('%Y%m%d') for date in dates] – Jules0080 Feb 11 '15 at 06:01
  • I've pip installed the numpy module and imported all the library, from numpy import * – Jules0080 Feb 11 '15 at 07:21
  • type there, you want: [date.strftime('%Y%m%d') for date in dates] (you are calling strftime on each "date" in the sequence of "dates". – Chris Barker Feb 12 '15 at 17:17
0

The problem with the attribute error is because content needs to be a list, and you initialize it with lat, which is just a number. You need to stuff that into a list.

Regarding the 3D variables, lats = nc.variables['latitude'][:] is sufficient to read all the data.

Update: Iterate over lon/lat together

Here's your code with the mod for the list and iteration:

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
        t = num2date(time, units = units, calendar='365_day')
        header.append(t)
    outputwriter.writerow(header)

    for latlon_index, (lat,lon) in enumerate(zip(lats, lons)):
        content = [lat, lon] # Put lat and lon into list
        print latlon_index
        for time_index, time in enumerate(times): # for a date
            # pull out the data 
            data = sfc[time_index,lat_index,lon_index]
            content.append(data)
            outputwriter.writerow(content)``

I haven't actually tried to run this, so there may be other problems lurking.

DopplerShift
  • 5,472
  • 1
  • 21
  • 20
  • content = [lat] solved the error. The logic of the code I posted is wrong, for each lat index the code iterates through all the lon indexes. Instead of making sure the same index is used for both lat and lon when calling data = sfc[time_index,lat_index,lon_index]. That's the main issue at the moment the next code block iterating through time is correct because I need the sfc value for each of the times for the same lat/lon – Jules0080 Feb 12 '15 at 06:04
  • DopplerShift it worked. I moved the outputwriter outside the time for loop, the rows are written to the file correctly. – Jules0080 Feb 17 '15 at 05:45
  • `zip(lats, lons)` does not give you every combination of lat/lon. It should be `product(lats, lons)`, using `itertools.product`. – alphabetasoup Dec 13 '22 at 02:35