1

I have a Data frame like this

Temp_in_C   Temp_in_F   Date          Year   Month   Day
   23          65       2011-12-12     2011    12     12
   12          72       2011-12-12    2011     12      12
   NaN         67       2011-12-12     2011    12      12
   0           0        2011-12-12     2011    12      12
   7           55       2011-12-13     2011    12       13

I am trying to get output in this format (The NaN and zero values of pertuculer day is replaced by avg temp of that day only) Output will be

Temp_in_C   Temp_in_F   Date          Year   Month   Day
   23          65       2011-12-12     2011    12     12
   12          72       2011-12-12    2011     12      12
   17.5        67       2011-12-12     2011    12      12
   17.5        68       2011-12-12     2011    12      12
   7           55       2011-12-13     2011    12       13

These vales will be replaced by mean of that perticuler day. I am trying to do this

temp_df = csv_data_df[csv_data_df["Temp_in_C"]!=0]
temp_df["Temp_in_C"] = 
temp_df["Temp_in_C"].replace('*',np.nan)
x=temp_df["Temp_in_C"].mean()   
csv_data_df["Temp_in_C"]=csv_data_df["Temp_in_C"]
.replace(0.0,x)
csv_data_df["Temp_in_C"]=csv_data_df["Temp_in_C"]
.fillna(x)

This code is taking the mean of whole columns and replacing it directly. How can i group by day and take mean and then replace values for that particular day only.

PriyalChaudhari
  • 363
  • 1
  • 7
  • 23
  • Doesn't this result in corrupt data? For example your 17.5 C ends up next to two different Fahrenheit values (neither of which is correct; they should both be 63.5). – John Zwinck Jul 02 '17 at 08:10
  • @JohnZwinck those values are not calculated i wrote them just for understanding my problem. Not actual values in data. I just wanted to define my problem . Im ideal case they will be same – PriyalChaudhari Jul 02 '17 at 08:22

1 Answers1

0

First, replace zeros with NaN

df = df.replace(0,np.nan)

Then fill the missing values using transform (see this post)

df.groupby('Date').transform(lambda x: x.fillna(x.mean()))

Gives:

   Temp_in_C  Temp_in_F  Year  Month  Day
0       23.0       65.0  2011     12   12
1       12.0       72.0  2011     12   12
2       17.5       67.0  2011     12   12
3       17.5       68.0  2011     12   12
4        7.0       55.0  2011     12   13
VinceP
  • 2,058
  • 2
  • 19
  • 29