I have a data table object
> Hydro_Sen
Index Date Obs_m3_s T_str P_factor Flow_m3_s Gauge Month Year T Normalised
1: 0 1/04/2000 13.37000 T_-1 0.95 28.987400 Aconcagua 4 2000 -1 0.08409943
2: 1 1/05/2000 9.94387 T_-1 0.95 15.542100 Aconcagua 5 2000 -1 -0.59053122
3: 2 1/06/2000 13.80530 T_-1 0.95 19.139900 Aconcagua 6 2000 -1 -0.41000821
---
165238: 165237 1/01/2018 NA T_4 1.40 0.593462 Juncal2 1 2018 4 -1.34059328
165239: 165238 1/02/2018 NA T_4 1.40 0.403063 Juncal2 2 2018 4 -1.35014673
165240: 165239 1/03/2018 NA T_4 1.40 0.252990 Juncal2 3 2018 4 -1.35767678
> str(Hydro_Sen)
Classes ‘data.table’ and 'data.frame': 165240 obs. of 11 variables:
$ Index : int 0 1 2 3 4 5 6 7 8 9 ...
$ Date : chr "1/04/2000" "1/05/2000" "1/06/2000" "1/07/2000" ...
$ Obs_m3_s : num 13.37 9.94 13.81 23.18 21.87 ...
$ T_str : chr "T_-1" "T_-1" "T_-1" "T_-1" ...
$ P_factor : num 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
$ Flow_m3_s : num 29 15.5 19.1 20.8 18.5 ...
$ Gauge : chr "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
$ Month : int 4 5 6 7 8 9 10 11 12 1 ...
$ Year : int 2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 ...
$ T : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ Normalised: num 0.0841 -0.5905 -0.41 -0.3283 -0.4442 ...
- attr(*, ".internal.selfref")=<externalptr>
and am trying to create a new column (Normalised2), after normalising an existing column (FLow_m3_s), with the mean and std values in another column (Obs_m3_s). The mean and std are calculated separately, as a function of the values of another column (Gauge).
I tried doing the following
for(i in unique(Hydro_Sen$Gauge)){
Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-mean(Temp, na.rm=T))/sd(Temp, na.rm=T)]
}
but i get the following error
Warning messages:
1: In if (na.rm) x <- x[!is.na(x)] :
the condition has length > 1 and only the first element will be used
I checked other posts (the condition has length > 1 and only the first element will be used in if else statement and The condition has length > 1 and only the first element will be used amongst others) and saw that the problem usually arises when R is forced to assess an if condition on a vector. However, am not sure how this applies to my case. I checked that the calculation of the mean and sd was ok,
> mean(Temp, na.rm=T)
[1] 7.093052
> length(mean(Temp, na.rm=T))
[1] 1
> str(mean(Temp, na.rm=T))
num 7.09
The warning is eliminated if i remove the (na.rm=T)
in the mean and sd calculations, but in those cases i will get NA as answer. I found a solution by doing the following
for(i in unique(Hydro_Sen$Gauge)){
Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
Temp2=mean(Temp, na.rm=T)
Temp3=sd(Temp, na.rm=T)
Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-Temp2)/Temp3]
}
But i wanted to understand why the first solution generates the Warning message? Any ideas of how to deal with this?