How to make a histogram from this nc file?

Question

I'm a research assistant and I've recently started to learn python to interpret model output in netCDF file format. Let me give a quick background on my question:

I have already searched through a certain grid area of a netCDF file using the netCDF4 module and stored an array of times, which I then converted to a list of dates using netCDF4's num2date feature. I have shown my code below. Please note that restrictedrange is a subset of a variable from an nc file and rmduplicates() is not shown.

import netCDF4 as nc
import numpy as np
import matplotlib.pyplot as pyp
import matplotlib as mpl
import datetime as dtm
flor = nc.Dataset('FLOR.slp_subset1.nc','r')    

times = []
timecounter = .25
for i in restrictedrange:
     for j in np.nditer(i):
         if(j <= 975):
              times.append(timecounter)
    timecounter += .25
uniquetimes = rmduplicates(times)
dates = nc.num2date(uniquetimes,'days since 0001-01-01 00:00:00','julian')

stacked_dates = []
for date in dates:
    stacked_dates.append(date.replace(year=0001))
stacked_dates = mpl.dates.date2num(stacked_dates)

fig = pyp.figure()
ax = pyp.subplot(111)
ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
format = mpl.dates.DateFormatter('%m/%d')
ax.xaxis.set_major_formatter(format)

ax.hist(stacked_dates)

pyp.xticks(rotation='vertical')

pyp.show()

Now I have a list of dates in the format "(y)yy-mm-dd hh:mm:ss". I would now like to take those dates and make a histogram (possibly using matplotlib or whatever is best for this) by month. So, bars = frequency, bins are months. Also, if it wasn't clear from my format, some years have three numbers, some only two, but actually none that have 1.

Again, I'm quite new to python so I appreciate any help and I apologize if this question is poorly formatted, as I have never used this website.

Thanks!

It's unclear exactly what you want ... I feel like most of the beginning of this question is actually irrelevant. Is it just that you have a list of string dates and want to get a histogram? — Ajean, Feb 19 '15 at 23:39
I just put it there in case it was necessary to see how I am generating output (using netCDF4 instead of matplotlib for num2date for example). However, you are correct, I have a list of string dates and want a histogram based on the monthly frequency. — LonelyHeartsClub, Feb 20 '15 at 07:33

score 3 · Accepted Answer · edited May 23 '17 at 12:06

I don't know what you have for data, but here's an mock example of how to make a histogram with months\days on x axis.

I can only assume that you start with a list of datetime objects, but I can't figure out what nc is (is that matplotlib.date module?) or what kind of times can exactly be found in the unique times. So generally this is the approach.

These modules you will need and use.

import matplotlib as mpl
import matplotlib.pyplot as plt
import datetime

These are the mock dates I've used. for this example. There are only 11 months on there, so mostly all bins will be 1 in the end.

for i in range(1, 12):
    dates.append(datetime.datetime(i*5+1960, i, i, i, i, i))

[datetime.datetime(1965, 1, 1, 1, 1, 1), datetime.datetime(1970, 2, 2, 2, 2, 2), datetime.datetime(1975, 3, 3, 3, 3, 3), datetime.datetime(1980, 4, 4, 4, 4, 4), datetime.datetime(1985, 5, 5, 5, 5, 5), datetime.datetime(1990, 6, 6, 6, 6, 6), datetime.datetime(1995, 7, 7, 7, 7, 7), datetime.datetime(2000, 8, 8, 8, 8, 8), datetime.datetime(2005, 9, 9, 9, 9, 9), datetime.datetime(2010, 10, 10, 10, 10, 10), datetime.datetime(2015, 11, 11, 11, 11, 11)]

If like in the above example you're dealing with different years, you're going to have to "stack" them yourself. Otherwise the date2num function I'll use later will produce wildly different numbers. To "stack" them means convert them as if they all happened in the same year.

stacked_dates = []
for date in dates:
    stacked_dates.append( date.replace(year=2000)  )

>>> stacked_dates
[datetime.datetime(2000, 1, 1, 1, 1, 1), datetime.datetime(2000, 2, 2, 2, 2, 2), datetime.datetime(2000, 3, 3, 3, 3, 3), datetime.datetime(2000, 4, 4, 4, 4, 4), datetime.datetime(2000, 5, 5, 5, 5, 5), datetime.datetime(2000, 6, 6, 6, 6, 6), datetime.datetime(2000, 7, 7, 7, 7, 7), datetime.datetime(2000, 8, 8, 8, 8, 8), datetime.datetime(2000, 9, 9, 9, 9, 9), datetime.datetime(2000, 10, 10, 10, 10, 10), datetime.datetime(2000, 11, 11, 11, 11, 11)]

Ok. Now we can use the date2num function to get something mpl actually understands. (Btw, if you want to plot just this data you can with plt.plot_dates function, that function understands datetime objects)

stacked_dates = mpl.dates.date2num(stacked_dates)

>>> stacked_dates
array([ 730120.04237269,  730152.08474537,  730182.12711806,
        730214.16949074,  730245.21186343,  730277.25423611,
        730308.2966088 ,  730340.33898148,  730372.38135417,
        730403.42372685,  730435.46609954])

Ok now for the plotting itself. mpl can understand these numbers, but it will not automatically assume they are dates. It will treat them as normal numbers. That's why we've got to tell the x axis that they're actually dates. Do that with major_axis_formatter and set_major_locator

fig = plt.figure()
ax = plt.subplot(111)
ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
format = mpl.dates.DateFormatter('%m/%d') #explore other options of display
ax.xaxis.set_major_formatter(format)

ax.hist(stacked_dates) #plot the damned thing

plt.xticks(rotation='vertical') #avoid overlapping numbers
                           #make sure you do this AFTER .hist function

plt.show()

This code produces following graph:

enter image description here

Do note that there's a chance you won't be able to see dates on your original graph because they'll run off screen (formats like these can be long, and don't fit on the graph). In that case press the "configure subplots" button and adjust value for "bottom". In the script you can do that by plt.subplots_adjust(bottom=.3) or some other value.

You should also take care to specify that there are 12 bins in ax.hist(stacked_dates, bins=12) because default is 10, and will look funky like my graph.

Also there's a simpler, albeit less modifiable/personofiable etc... possibility by using a bar plot, instead of a histogram. Read about it HERE But it really depends on what kind of information you have. If it's a lot of dates, it's probably easier to let the hist function calculate bin heights than doing it by yourself. If it's some other info, it's worthwhile to consider using a bar plot.

Complete script would be something like:

import matplotlib as mpl
import matplotlib.pyplot as plt
import datetime

stacked_dates = []
for date in dates:
    stacked_dates.append( date.replace(year=2000)  )

stacked_dates = mpl.dates.date2num(stacked_dates)

fig = plt.figure()
ax = plt.subplot(111)
ax.xaxis.set_major_locator(mpl.dates.MonthLocator())
format = mpl.dates.DateFormatter('%m/%d')
ax.xaxis.set_major_formatter(format)

ax.hist(stacked_dates)

plt.xticks(rotation='vertical')  
plt.show()

Thank you for your help. I should have put what nc is (again, I'm new, so my apologies). I imported netCDF4 module as nc in my code and did this using netCDF4's num2date function. My reason for using this is because the calender that this file uses is Julian and not Gregorian (as mpl uses). Is there a way to use mpl without skewing the dates over a day or so because of the calender difference? — LonelyHeartsClub, Feb 20 '15 at 07:13
Just a quick follow-up, I think the calender difference is preventing me from moving past stacking step, as even when converting my times to dates using mpl, I keep getting the error "ValueError: day is out of range for month." — LonelyHeartsClub, Feb 20 '15 at 07:29
@LonelyHeartsClub Ah, I didn't know nc has that same function. As for your first question, yes. Post some code and let me see how you convert it. As far as I saw netcdftime module is just a wee bit wrapped datetime module. I would recommend doing the conversion explicitly, like i.e. `for date in ncdate: list.append(datetime(year=date.year .... day=date.day-1)` This gives you the option to have `if` conditions etc to control exactly how conversion happens. The only problem I see are dates before 1500. However if you're not using old info, no problem. — ljetibo, Feb 20 '15 at 10:49
Sure, I'll post exactly what I have done to achieve this list. Also, I do in fact have years before 1500 because this is hypothetical data (randomly generated model output) so my years start at year 1. `import netCDF4 as nc; import numpy as np;` `flor = nc.Dataset('FLOR.slp_subset1.nc','r')` `times = []; timecounter = .25` `for i in restrictedrange: ` `for j in np.nditer(i):` `if(j <= 975):` `times.append(timecounter)` `timecounter += .25` `uniquetimes = rmduplicates(times)` Times is a list of values like .25,.50 that correspond to how many days after 1/1/0001 — LonelyHeartsClub, Feb 20 '15 at 17:06
@LonelyHeartsClub please edit your original post accordingly, comments don't have the same editor powers. Also this seems to be the exact same code you posted in your original question. I'd also like to know what's the type of `nc.num2dates` hopefully it's a float number. If it is, you can do everything I did except you have to make sure you use the `nc.datetime` module to stack dates and convert them to numbers with `date2num`. `nc` should make corrections itself, and the end result is float, which `mpl` interprets well. 1/2 — ljetibo, Feb 20 '15 at 17:17
@LonelyHeartsClub 2/2 `date2num` returns number of days between 2 dates, and fractions are h/m/s. If you have a date difference, the calendar you used doesn't make a difference anymore. Both calendars give consistent date differences. Jdate1-Jdate2 = ndays = Gdate1-Gdate2. As long as the difference is calculated by the rules of the calendar (skip years etc...) which `nc` handles for you, there should be no differences. Just make sure you handle calculating the difference with `nc` — ljetibo, Feb 20 '15 at 17:22
I think I understand my issue. netCDF4's num2date function returns (as they say in their documentation) "phony" datetime objects with reduced functionality if the calender isn't gregorian. So, when I call replace on each date element in my dates[], I'm getting the error that date has no attribute .replace. And sorry, I'll edit my code! — LonelyHeartsClub, Feb 20 '15 at 17:26
@LonelyHeartsClub Maybe however they defined getters and setters for values? Have you tried `dates[i].day = 1`? — ljetibo, Feb 20 '15 at 17:28
@LonelyHeartsClub That's much much better after the edit! I can see your problem now! You are converting dates to numbers before you stacked them! If you want to have a histogram of **cumulative** data by month you have to stack them to the same year **before** you convert them to num! Also the return of `nc.date2num` is defined as "Return numeric time values given datetime objects" and not `datetime` object. If you don't need a cumulative histogram, just convert them to num and plot that. Also please post the error message you get currently (if you have one?) — ljetibo, Feb 20 '15 at 17:34
I'm a bit confused, as my stacked_dates conversion occurs before I call the date2num function. The error I get from running my code currently (as presented above) is `AttributeError: 'datetime' object has no attribute 'replace'` . Also after trying using `dates[i].day = 1`, I get the error `AttributeError: can't set attribute` — LonelyHeartsClub, Feb 20 '15 at 17:48
ah, sry, I missread, that's `num2dates` not `date2num`. Well, that's it then. If they haven't defined setters or defined the replace function, you're pretty much stuck with only 1 option. You have to use the numerical dates, determine the exact difference of that date to the year you're stacking to (i.e. stack everything on 2000) and then add or substract that value to the numerical value of the date you're trying to stack. So if there has been 2365 days (i.e.) from 1965 to 2000, you add 2365 to date2num(1965). Condition: corrections for julian calendar pre 1500 applied for years <1600.Manualy — ljetibo, Feb 20 '15 at 17:59

How to make a histogram from this nc file?

1 Answers1

Linked