1

I have a data table object

> Hydro_Sen
         Index      Date Obs_m3_s T_str P_factor Flow_m3_s     Gauge Month Year  T  Normalised
     1:      0 1/04/2000 13.37000  T_-1     0.95 28.987400 Aconcagua     4 2000 -1  0.08409943
     2:      1 1/05/2000  9.94387  T_-1     0.95 15.542100 Aconcagua     5 2000 -1 -0.59053122
     3:      2 1/06/2000 13.80530  T_-1     0.95 19.139900 Aconcagua     6 2000 -1 -0.41000821
---
165238: 165237 1/01/2018       NA   T_4     1.40  0.593462   Juncal2     1 2018  4 -1.34059328
165239: 165238 1/02/2018       NA   T_4     1.40  0.403063   Juncal2     2 2018  4 -1.35014673
165240: 165239 1/03/2018       NA   T_4     1.40  0.252990   Juncal2     3 2018  4 -1.35767678

> str(Hydro_Sen)
Classes ‘data.table’ and 'data.frame':  165240 obs. of  11 variables:
 $ Index     : int  0 1 2 3 4 5 6 7 8 9 ...
 $ Date      : chr  "1/04/2000" "1/05/2000" "1/06/2000" "1/07/2000" ...
 $ Obs_m3_s  : num  13.37 9.94 13.81 23.18 21.87 ...
 $ T_str     : chr  "T_-1" "T_-1" "T_-1" "T_-1" ...
 $ P_factor  : num  0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 ...
 $ Flow_m3_s : num  29 15.5 19.1 20.8 18.5 ...
 $ Gauge     : chr  "Aconcagua" "Aconcagua" "Aconcagua" "Aconcagua" ...
 $ Month     : int  4 5 6 7 8 9 10 11 12 1 ...
 $ Year      : int  2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 ...
 $ T         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Normalised: num  0.0841 -0.5905 -0.41 -0.3283 -0.4442 ...
 - attr(*, ".internal.selfref")=<externalptr> 

and am trying to create a new column (Normalised2), after normalising an existing column (FLow_m3_s), with the mean and std values in another column (Obs_m3_s). The mean and std are calculated separately, as a function of the values of another column (Gauge).

I tried doing the following

for(i in unique(Hydro_Sen$Gauge)){
  Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
  Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-mean(Temp, na.rm=T))/sd(Temp, na.rm=T)]
}

but i get the following error

Warning messages:
1: In if (na.rm) x <- x[!is.na(x)] :
  the condition has length > 1 and only the first element will be used

I checked other posts (the condition has length > 1 and only the first element will be used in if else statement and The condition has length > 1 and only the first element will be used amongst others) and saw that the problem usually arises when R is forced to assess an if condition on a vector. However, am not sure how this applies to my case. I checked that the calculation of the mean and sd was ok,

> mean(Temp, na.rm=T)
[1] 7.093052
> length(mean(Temp, na.rm=T))
[1] 1
> str(mean(Temp, na.rm=T))
 num 7.09

The warning is eliminated if i remove the (na.rm=T) in the mean and sd calculations, but in those cases i will get NA as answer. I found a solution by doing the following

for(i in unique(Hydro_Sen$Gauge)){
  Temp=Hydro_Sen[Gauge==i & T_str=="T_0" & P_factor==1]$Obs_m3_s
  Temp2=mean(Temp, na.rm=T)
  Temp3=sd(Temp, na.rm=T)
  Hydro_Sen[Gauge==i,Normalised2:=(Flow_m3_s-Temp2)/Temp3]
}

But i wanted to understand why the first solution generates the Warning message? Any ideas of how to deal with this?

Juan Ossa
  • 1,153
  • 1
  • 10
  • 14
  • 2
    Don't shorten TRUE to T especially when a column is named T (since the latter takes precedence). (With that change, I guess the warning makes sense, right?) – Frank May 15 '18 at 16:28
  • 1
    Thanks, that was the mistake. Thanks. – Juan Ossa May 15 '18 at 22:44

0 Answers0