I have a dataframe which is like the followng:
dat <- data.frame(participant = c(rep("01", times = 3), rep("02", times = 3)),
target = rep(c("1", "2", "3"), times = 2),
eucDist = c(0.06, 0.16, 0.89, 0.10, 0.11, 0.75),
eucDist2 = c(0.09, 0.04, 0.03, 0.05, 1.45, 0.09))
participant target eucDist eucDist2
1 01 1 0.06 0.09
2 01 2 0.16 0.04
3 01 3 0.89 0.03
4 02 1 0.10 0.05
5 02 2 0.11 1.45
6 02 3 0.75 0.09
I have run some code to identify outliers in the eucDist and eucDist2 columns, which I have saved in separate dataframes. Examples of these can be seen here:
outliers1 <- data.frame(participant = c("01", "02"),
target = c("1", "3"),
eucDist = c(0.06, 0.75),
eucDist2 = c(0.09, 0.09))
participant target eucDist eucDist2
1 01 1 0.06 0.09
2 02 3 0.75 0.09
outliers2 <- data.frame(participant = "01",
target = "1",
eucDist = 0.06,
eucDist2 = 0.09)
participant target eucDist eucDist2
1 01 1 0.06 0.09
The rows shown in Outliers1
indicate outliers in the eucDist column in dat
, and the row shown in Outliers2
indicates an outlier in the eucDist2 column.
I would like to replace the outlier values in the eucDist and eucDist2 columns of dat
with 'NA'. I do not want to remove whole rows because in many cases either the eucDist or eucDist2 values are usable, and removing the entire row would remove both variables.
Here is what I would like:
participant target eucDist eucDist2
1 01 1 NA NA
2 01 2 0.16 0.04
3 01 3 0.89 0.03
4 02 1 0.10 0.05
5 02 2 0.11 1.45
6 02 3 NA 0.09
I have been attempting this using conditional %in% statements, but can't quite get the phrasing correct and would really appreciate some help. Here is my non-working code:
library(naniar)
dat1 <- if (dat$eucDist %in% Outliers1$eucDist) {
replace_with_na_all(dat$eucDist)
}