2

I have a CSV file with the following structure : Grid Cell, Date, Station Temperature

My goal is to only keep those years with a full complement of months, and drop years with less than 12 months.

I have figured out how to identify which years have a full complement of months, but I am unsure of how to apply this to the original dataset to subset my data:

import numpy as np
import pandas as pd


Grid_Cells = [4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719,4719
              ,4719,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935,4935]

Dates = ['01-09-2008','01-10-2008','01-11-2008','01-12-2008','01-01-2009','01-02-2009','01-03-2009',
         '01-04-2009','01-05-2009','01-06-2009','01-07-2009','01-08-2009','01-09-2009','01-10-2009',
         '01-11-2009','01-12-2009','01-01-2013','01-02-2013','01-03-2013','01-04-2013','01-05-2013',
         '01-06-2013','01-07-2013','01-08-2013','01-09-2013','01-10-2013','01-11-2013','01-12-2013']

Temps = [1.97479861111111,-0.391396505,-1.784091667,-6.509092742,-11.81903226,-14.34362798,-12.40221774,
         -9.133213889,-1.039681452,0.907477778,3.54647043,3.893416667,2.161473611,0.015456989,-1.567216667,
         -4.373807796,-3.63483871,-7.023452381,-6.49688172,-5.683111111,-1.053548387,7.404777778,9.015913978,
         8.415376344,2.605666667,0.597096774,-2.949,-4.595483871]



dframe = pd.DataFrame(data=Grid_Cells,columns=['Grid Cells'])
dframe['Date'] = Dates
dframe['Station Temperature'] = Temps
dframe['DateTime'] = pd.to_datetime(dframe['Date'])
dt_vals = dframe['DateTime']
dframe['Year'] = [i.year for i in dt_vals]
dframe['Month'] = [i.month for i in dt_vals]
    
months_in_year = dframe.groupby(['Grid Cells','Year'])['Month'].count() #count number of months in year
    
subset_months = months_in_year.drop(months_in_year[months_in_year < 12].index) #drop years with less than 12 months

But I have been having issues figuring out how to use subset_months to subset my dframe to select only those years with less than 12 months (since it is multi-index with both Grid Cell and Year as an index). Does anyone have a suggestion?

1 Answers1

1

You were very close. Do an inner join with subset_months and dframe on ['Grid Cells', 'Year'] and you'll have only the values you want.

dframe = dframe.merge(subset_months.reset_index()[['Grid Cells', 'Year']],
                      how = 'inner', 
                      on  = ['Grid Cells', 'Year']
                     )
    Grid Cells        Date  Station Temperature   DateTime  Year  Month
0         4719  01-01-2009           -11.819032 2009-01-01  2009      1
1         4719  01-02-2009           -14.343628 2009-01-02  2009      1
2         4719  01-03-2009           -12.402218 2009-01-03  2009      1
3         4719  01-04-2009            -9.133214 2009-01-04  2009      1
4         4719  01-05-2009            -1.039681 2009-01-05  2009      1
5         4719  01-06-2009             0.907478 2009-01-06  2009      1
6         4719  01-07-2009             3.546470 2009-01-07  2009      1
7         4719  01-08-2009             3.893417 2009-01-08  2009      1
8         4719  01-09-2009             2.161474 2009-01-09  2009      1
9         4719  01-10-2009             0.015457 2009-01-10  2009      1
10        4719  01-11-2009            -1.567217 2009-01-11  2009      1
11        4719  01-12-2009            -4.373808 2009-01-12  2009      1
12        4935  01-01-2013            -3.634839 2013-01-01  2013      1
13        4935  01-02-2013            -7.023452 2013-01-02  2013      1
14        4935  01-03-2013            -6.496882 2013-01-03  2013      1
15        4935  01-04-2013            -5.683111 2013-01-04  2013      1
16        4935  01-05-2013            -1.053548 2013-01-05  2013      1
17        4935  01-06-2013             7.404778 2013-01-06  2013      1
18        4935  01-07-2013             9.015914 2013-01-07  2013      1
19        4935  01-08-2013             8.415376 2013-01-08  2013      1
20        4935  01-09-2013             2.605667 2013-01-09  2013      1
21        4935  01-10-2013             0.597097 2013-01-10  2013      1
22        4935  01-11-2013            -2.949000 2013-01-11  2013      1
23        4935  01-12-2013            -4.595484 2013-01-12  2013      1
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21