I have a data.frame as follows. For every column, I want to assign a value of '1' if the value falls in the range between 2.5 percentile (>=) [second last column] and 97.5 percentile (<=) [last column]. If not, then I want to assign a value 2. At places where there are NAs, I want to let the NAs just be.
> df
S1 S2 S3 S4 S5 S6 2.5% 97.5%
Gene1 0.02 0.04 0.05 0.03 0.10 0.06 0.01 0.08
Gene1 0.04 0.04 0.04 0.06 0.03 0.04 0.03 0.09
Gene1 51.00 57.00 50.00 54.00 70.00 63.00 41.00 71.00
Gene1 0.46 0.35 0.28 0.41 0.26 0.29 0.21 0.45
Gene1 0.09 0.08 0.09 0.09 0.08 0.09 0.07 0.10
Gene1 46.80 44.60 48.40 45.30 40.90 46.10 36.69 49.20
Gene1 NA NA 20.30 14.10 12.10 15.50 8.72 25.61
Gene1 1.96 1.05 1.39 1.56 1.54 1.71 1.24 2.00
Gene1 0.53 0.52 0.61 0.83 0.73 0.37 0.36 0.91
Gene1 1.05 0.55 0.85 1.30 1.14 0.64 0.61 1.39
Gene1 22.31 28.88 26.75 25.08 26.29 23.34 19.76 33.44
So in the end, I want a data.frame that looks like this.
> df_updated
S1 S2 S3 S4 S5 S6
Gene1 1 1 1 1 2 1
Gene1 1 1 1 1 1 1
Gene1 1 1 1 1 1 1
Gene1 2 1 1 1 1 1
Gene1 1 1 1 1 1 1
Gene1 1 1 1 1 1 1
Gene1 NA NA 1 1 1 1
Gene1 1 2 1 1 1 1
Gene1 1 1 1 1 1 1
Gene1 2 2 1 1 1 1
Gene1 1 1 1 1 1 1
I tried something like follows, but ended up with an error. I learnt from here (Error in if/while (condition) {: missing Value where TRUE/FALSE needed) that this is related to the NAs that I have in the matrix, but I am not sure how to adapt by code for it, and to get the dataframe 'df_updated' I want.
df_updated <- as.data.frame(lapply(df, function(x) if (x>=df$`2.5%` & x<=df$`97.5%`) {x==1} else {x==2}))
Error in if (x >= df$`2.5%` & x <= df$`97.5%`) { :
missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Any help appreciated. Thank you.