1

I have the following DataFrame :

                         H     T       date
date                                       
1990-08-26 11:30:00   38.0  11.6 1990-08-26
1990-08-26 11:30:00   63.0  11.3 1990-08-26
1990-08-26 11:30:00   87.0  10.9 1990-08-26
1990-08-26 11:30:00  111.0  10.6 1990-08-26
1990-08-26 11:30:00  134.0  10.4 1990-08-26
1990-08-26 11:30:00  154.0  10.1 1990-08-26
1990-08-26 11:30:00  178.0   9.9 1990-08-26
1990-08-26 11:30:00  205.0   9.6 1990-08-26
1990-08-26 11:30:00  233.0   9.4 1990-08-26
1990-08-26 11:30:00  260.0   9.2 1990-08-26

Where T is temperature and H is height in meters. I want to count how many days are in each month ( and in each year) and then do this as a bar plot. So what I do is the following ( the code is below)

df = pd.read_csv('/radiosonde_Iceland_analysis.   
/data/H_T_series_1991_2016',sep = "\t")
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df['date'] = df.index.date
df['date'] = pd.to_datetime(df['date'], errors='coerce')
print(df.head(10))
  df_count=df.groupby([df.date.dt.year,df.date.dt.month,df.date.dt.day]).count()
df_count=df_count[df_count['date']>1991]
print(df_count)
fig,ax = plt.subplots()

plt.xticks(rotation=90)
ax.set_xlabel('Year')
ax.set_ylabel('Nr of observations(vertical points) per   
year')   
df_count['H'].plot(kind='bar',stacked=True,width=0.6,colormap='Paired',alpha=0.7)
plt.savefig('count_heights_ave_1991_2016.png',dpi=200)
plt.show()

First, how can I get the total number of ocurences of T for each month ? What I get is something like:

1992 1 2 2113 2111 2113 4 2148 2146 2148 5 2028 2027 2028 12 2044 2042 2044 19 2361 2361 2361 21 2061 2061 2061 22 2014 2014 2014 23 2008 2008 2008 24 2161 2161 2161 27 2024 2023 2024 29 2374 2373 2374 4 3 2025 2024 2025 1995 7 11 2009 2009 2009 2006 1 1 4593 4593 4593 2 4870 4870 4870 3 4249 4249 4249 4 4761 4761 4761 5 4889 4889 4889 6 2380 2380 2380 7 4504 4504 4504 8 4828 4828 4828 9 4933 4933 4933 but I would like the monthly totals.

Second, I would want the years on the x axis labels but since the index is defined as yy-mm-dd-hh etc I get all the labels crammed in there ( as in the figure attached - is all black as you can see). Can you please tell me what is the way to plot those axes correctly and only have tick marks for years ( not for the months).

enter image description here

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235

1 Answers1

0

Some test data:

import numpy as np
import pandas as pd

np.random.seed(444)

start = '1990-01-01'
end = '2017-12-31'
idx = pd.date_range(start, end, freq='30min')

# different number of repeats per 30-min tick
rpt = np.random.randint(1, 5, size=idx.size)
idx = np.repeat(idx, rpt)
df = pd.DataFrame({'T': np.random.randn(idx.size)}, index=idx)
df['date'] = df.index.date
df.index.name = 'date'

Snippet:

>>> df.head()
                            T        date
date                                     
1990-01-01 00:00:00 -0.335715  1990-01-01
1990-01-01 00:00:00  0.867022  1990-01-01
1990-01-01 00:00:00 -0.503262  1990-01-01
1990-01-01 00:30:00 -0.543694  1990-01-01
1990-01-01 01:00:00  2.067549  1990-01-01

And your questions:

First, how can I get the total number of occurrences of T for each month?

I'll assume your looking for the occurrences for each year, month combination. You can use .groupby() for that:

>>> counts = df.groupby(by=[df.index.year, df.index.month])['T'].count()

>>> counts.head()
date  date
1990  1       3750
      2       3357
      3       3626
      4       3539
      5       3790
Name: T, dtype: int64

>>> counts.tail()
date  date
2017  8       3711
      9       3611
      10      3649
      11      3689
      12      3557
Name: T, dtype: int64

Note that .count() is the number of non-null observations.

What is the way to plot those axes correctly and only have tick marks for years (not for the months)?

This is tricker, although mine is probably not the smartest solution. (I think you should be able to use MonthFormatter as well.)

allyrs = counts.index.get_level_values(0)
uyrs = allyrs.unique()
mask = np.zeros_like(uyrs)
mask[1:] = np.where(allyrs[1:] != allyrs[:-1])[0]

counts.plot(kind='bar')
plt.xticks(mask, yrs)
plt.title('Obs. Count by Year/Month')
plt.xlabel('Year-Month')
plt.ylabel('Count')

Result:

enter image description here

Brad Solomon
  • 38,521
  • 31
  • 149
  • 235