0

I m having a dataframe with dates from 2006 to 2016 and for each date 7 corresponding values. The data is like below:

                   H     PS    T    RH   TD   WDIR   WSP
date                                                        
2006-01-01 11:28:00   38  988.6  0.9  98.0  0.6  120.0  14.4
2006-01-01 11:28:00   46  987.6  0.5  91.0 -0.7  122.0  15.0
2006-01-01 11:28:00   57  986.3  0.5  89.0 -1.1  124.0  15.5
2006-01-01 11:28:00   66  985.1  0.5  90.0 -1.1  126.0  16.0
2006-01-01 11:28:00   74  984.1  0.4  90.0 -1.1  127.0  16.5
2006-01-01 11:28:00   81  983.3  0.4  90.0 -1.1  129.0  17.0

I want to select few columns for each year ( for example T and RH for all 2006) . So, for each year 2006 to 2016 select a bunch of columns then write each of the new dataframes in one file. I did the following:

df_H_T=(df[['RH','T']])
mask = (df_H_T['date'] >'2016-01-01 00:00:00') & (df_H_T['date']   <='2016-12-31 23:59:59')
df_H_T_2006 =df.loc[mask]
print(df_H_T_2006.head(20))
print(df_H_T_2006.tail(20)) 

But is not working because it seems it doesn't know what 'date' is but then when I print the head of the dataframe it seems date is there. What am I doing wrong ?

My second question is how can I put this in a loop over the year variable so that I don t write by hand each new dataframe and select one year at a time up to 2016 ? ( I m new and never used loops in python). Thanks, Ioana

  • you have to convert the dates to datetime objects. Otherwise you cannot slice them. See this post: https://stackoverflow.com/questions/32204631/how-to-convert-string-to-datetime-format-in-pandas-python – onno Sep 21 '18 at 20:47
  • Is `date` a regular column, or is it the index? – Acccumulation Sep 21 '18 at 20:59

1 Answers1

0

date is in the original dataframe, but then you take df_H_T=df[['RH','T']], so now date isn't in df_H_T. You can use masks generated from one dataframe to slice another, as long as they have the same index. So you can do

mask = (df['date'] >'2016-01-01 00:00:00') & (df['date']   <='2016-12-31 23:59:59')
df_H_T_2006 =df_H_T.loc[mask]

(Note: you're applying the mask to df, but presumably you want to apply it to df_H_T).

If date is in datetime format, you can just take df['date'].apply(lamda x: x.year==2016). For your for-loop, it would be

df_H_T=(df[['RH','T']])
for year in years:
    mask = df['date'].apply(lamda x: x.year==year)
    df_H_T_cur_year =df_H_T.loc[mask]
    print(df_H_T_cur_year.head(20))
    print(df_H_T_cur_year.tail(20)) 
Acccumulation
  • 3,491
  • 1
  • 8
  • 12