I have a CSV file with the following structure : Grid Cell, Date, Station Temperature
My goal is to only keep those years with a full complement of months, and drop years with less than 12 months.
I have figured out how to identify which years have a full complement of months, but I am unsure of how to apply this to the original dataset to subset my data:
import numpy as np
import pandas as pd
Grid_Cells = [4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719
,4719,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935]
Dates = ['01-09-2008','01-10-2008','01-11-2008','01-12-2008','01-01-2009','01-02-2009','01-03-2009',
'01-04-2009','01-05-2009','01-06-2009','01-07-2009','01-08-2009','01-09-2009','01-10-2009',
'01-11-2009','01-12-2009','01-01-2013','01-02-2013','01-03-2013','01-04-2013','01-05-2013',
'01-06-2013','01-07-2013','01-08-2013','01-09-2013','01-10-2013','01-11-2013','01-12-2013']
Temps = [1.97479861111111,-0.391396505,-1.784091667,-6.509092742,-11.81903226,-14.34362798,-12.40221774,
-9.133213889,-1.039681452,0.907477778,3.54647043,3.893416667,2.161473611,0.015456989,-1.567216667,
-4.373807796,-3.63483871,-7.023452381,-6.49688172,-5.683111111,-1.053548387,7.404777778,9.015913978,
8.415376344,2.605666667,0.597096774,-2.949,-4.595483871]
dframe = pd.DataFrame(data=Grid_Cells,columns=['Grid Cells'])
dframe['Date'] = Dates
dframe['Station Temperature'] = Temps
dframe['DateTime'] = pd.to_datetime(dframe['Date'])
dt_vals = dframe['DateTime']
dframe['Year'] = [i.year for i in dt_vals]
dframe['Month'] = [i.month for i in dt_vals]
months_in_year = dframe.groupby(['Grid Cells','Year'])['Month'].count() #count number of months in year
subset_months = months_in_year.drop(months_in_year[months_in_year < 12].index) #drop years with less than 12 months
But I have been having issues figuring out how to use subset_months to subset my dframe to select only those years with less than 12 months (since it is multi-index with both Grid Cell and Year as an index). Does anyone have a suggestion?