I have a data frame that is z-score converted. I want to delete from the data frame (and convert to NA) only those values that are higher or equal to 4, without dropping any row or column. I would appreciate an answer.
Best
You can use the following code:
df <- data.frame(v1 = c(1,3,6,7,3),
v2 = c(2,1,4,6,7),
v3 = c(1,2,3,4,5))
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 6 4 3
#> 4 7 6 4
#> 5 3 7 5
is.na(df) <- df >= 4
df
#> v1 v2 v3
#> 1 1 2 1
#> 2 3 1 2
#> 3 NA NA 3
#> 4 NA NA NA
#> 5 3 NA NA
Created on 2022-07-10 by the reprex package (v2.0.1)
you can simply use df[df>=4] <- NA
to achieve what you want.
df <- data.frame(replicate(10,sample(0:10,10,rep=TRUE)))
> df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 2 3 4 5 6 4 3 1 10 6
2 5 7 0 4 3 10 10 3 6 10
3 5 5 0 3 1 3 5 7 2 7
4 7 0 4 1 10 0 5 2 5 0
5 8 8 7 8 4 6 6 10 10 0
6 1 4 1 3 3 8 8 0 4 8
7 6 3 3 6 7 4 10 9 7 2
8 2 1 4 0 7 8 10 1 6 3
9 0 9 6 2 9 6 2 9 0 3
10 8 2 1 0 1 4 0 6 2 8
df[df>=4] <- NA
> df
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 2 3 NA NA NA NA 3 1 NA NA
2 NA NA 0 NA 3 NA NA 3 NA NA
3 NA NA 0 3 1 3 NA NA 2 NA
4 NA 0 NA 1 NA 0 NA 2 NA 0
5 NA NA NA NA NA NA NA NA NA 0
6 1 NA 1 3 3 NA NA 0 NA NA
7 NA 3 3 NA NA NA NA NA NA 2
8 2 1 NA 0 NA NA NA 1 NA 3
9 0 NA NA 2 NA NA 2 NA 0 3
10 NA 2 1 0 1 NA 0 NA 2 NA
Here is one more. Using replace_with_na_all()
from naniar
package:
replace_with_na_all()
when you want to replace ALL values that meet a condition across an entire dataset. The syntax here is a little different, and follows the rules for rlang’s expression of simple functions. This means that the function starts with ~
, and when referencing a variable, you use .x
.
https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.htmllibrary(naniar)
library(dplyr)
df %>%
replace_with_na_all(condition = ~.x > 4)
v1 v2 v3
<dbl> <dbl> <dbl>
1 1 2 1
2 3 1 2
3 NA 4 3
4 NA NA 4
5 3 NA NA
Though the solution by @Quinten is very concise, just add an approach in tidyverse
library(dplyr)
set.seed(123)
df <- data.frame(
x = sample(1:10, 7),
y = sample(1:10, 7)
)
df %>%
mutate(
across(.fns = ~ if_else(.x >= 4, NA_integer_, .x))
)
#> x y
#> 1 3 NA
#> 2 NA NA
#> 3 2 1
#> 4 NA 2
#> 5 NA 3
#> 6 NA NA
#> 7 1 NA
Created on 2022-07-10 by the reprex package (v2.0.1)
In base R, we can use replace()
:
df <- replace(df, df > 4, NA_real_)
Output
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 NA NA 3 NA 1 3 1 1 NA NA
2 1 NA 2 NA NA 3 NA NA 2 0
3 NA 1 NA 2 2 1 NA NA 4 1
4 NA NA 0 NA NA NA 0 2 4 NA
5 NA 1 NA 3 0 NA 4 NA 2 3
6 0 3 NA 0 NA NA 1 1 NA 2
7 3 NA NA NA 2 2 NA 2 NA 4
8 NA 1 0 2 NA NA 2 NA NA NA
9 NA 3 NA 2 4 NA NA 0 1 3
10 1 3 NA 3 NA NA 3 4 NA NA
Or use replace
in dplyr
:
library(dplyr)
df %>%
mutate(across(everything(), ~ replace(.x, .x > 4, NA_real_)))
Data
set.seed(321)
df <- data.frame(replicate(10, sample(0:10, 10, rep = TRUE)))
If the columns are numeric, an option is also to use ^
on a logical matrix (df >= 4
) to return NA for TRUE values and 1 for FALSE, then multiply with original data so that those elements corresponding to NA returns NA and the ones with 1 returns the original element
NA^(df >= 4) * df