4

I have a data frame that is z-score converted. I want to delete from the data frame (and convert to NA) only those values that are higher or equal to 4, without dropping any row or column. I would appreciate an answer.

Best

lotus
  • 87
  • 5

6 Answers6

8

You can use the following code:

df <- data.frame(v1 = c(1,3,6,7,3),
                 v2 = c(2,1,4,6,7),
                 v3 = c(1,2,3,4,5))
df
#>   v1 v2 v3
#> 1  1  2  1
#> 2  3  1  2
#> 3  6  4  3
#> 4  7  6  4
#> 5  3  7  5
is.na(df) <- df >= 4
df
#>   v1 v2 v3
#> 1  1  2  1
#> 2  3  1  2
#> 3 NA NA  3
#> 4 NA NA NA
#> 5  3 NA NA

Created on 2022-07-10 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53
8

you can simply use df[df>=4] <- NA to achieve what you want.

df <- data.frame(replicate(10,sample(0:10,10,rep=TRUE)))

> df
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   2  3  4  5  6  4  3  1 10   6
2   5  7  0  4  3 10 10  3  6  10
3   5  5  0  3  1  3  5  7  2   7
4   7  0  4  1 10  0  5  2  5   0
5   8  8  7  8  4  6  6 10 10   0
6   1  4  1  3  3  8  8  0  4   8
7   6  3  3  6  7  4 10  9  7   2
8   2  1  4  0  7  8 10  1  6   3
9   0  9  6  2  9  6  2  9  0   3
10  8  2  1  0  1  4  0  6  2   8



df[df>=4] <- NA

> df
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1   2  3 NA NA NA NA  3  1 NA  NA
2  NA NA  0 NA  3 NA NA  3 NA  NA
3  NA NA  0  3  1  3 NA NA  2  NA
4  NA  0 NA  1 NA  0 NA  2 NA   0
5  NA NA NA NA NA NA NA NA NA   0
6   1 NA  1  3  3 NA NA  0 NA  NA
7  NA  3  3 NA NA NA NA NA NA   2
8   2  1 NA  0 NA NA NA  1 NA   3
9   0 NA NA  2 NA NA  2 NA  0   3
10 NA  2  1  0  1 NA  0 NA  2  NA
jay.sf
  • 60,139
  • 8
  • 53
  • 110
6

Here is one more. Using replace_with_na_all() from naniar package:

  • Use replace_with_na_all() when you want to replace ALL values that meet a condition across an entire dataset. The syntax here is a little different, and follows the rules for rlang’s expression of simple functions. This means that the function starts with ~, and when referencing a variable, you use .x. https://cran.r-project.org/web/packages/naniar/vignettes/replace-with-na.html
library(naniar)
library(dplyr)

df %>% 
  replace_with_na_all(condition = ~.x > 4)
   v1    v2    v3
  <dbl> <dbl> <dbl>
1     1     2     1
2     3     1     2
3    NA     4     3
4    NA    NA     4
5     3    NA    NA
TarJae
  • 72,363
  • 6
  • 19
  • 66
4

Though the solution by @Quinten is very concise, just add an approach in tidyverse

library(dplyr)

set.seed(123)

df <- data.frame(
  x = sample(1:10, 7),
  y = sample(1:10, 7)
)

df %>% 
  mutate(
    across(.fns = ~ if_else(.x >= 4, NA_integer_, .x))
  )

#>    x  y
#> 1  3 NA
#> 2 NA NA
#> 3  2  1
#> 4 NA  2
#> 5 NA  3
#> 6 NA NA
#> 7  1 NA

Created on 2022-07-10 by the reprex package (v2.0.1)

shafee
  • 15,566
  • 3
  • 19
  • 47
1

In base R, we can use replace():

df <- replace(df, df > 4, NA_real_)

Output

   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1  NA NA  3 NA  1  3  1  1 NA  NA
2   1 NA  2 NA NA  3 NA NA  2   0
3  NA  1 NA  2  2  1 NA NA  4   1
4  NA NA  0 NA NA NA  0  2  4  NA
5  NA  1 NA  3  0 NA  4 NA  2   3
6   0  3 NA  0 NA NA  1  1 NA   2
7   3 NA NA NA  2  2 NA  2 NA   4
8  NA  1  0  2 NA NA  2 NA NA  NA
9  NA  3 NA  2  4 NA NA  0  1   3
10  1  3 NA  3 NA NA  3  4 NA  NA

Or use replace in dplyr:

library(dplyr)

df %>%
  mutate(across(everything(), ~ replace(.x, .x > 4, NA_real_)))

Data

set.seed(321)

df <- data.frame(replicate(10, sample(0:10, 10, rep = TRUE)))
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
1

If the columns are numeric, an option is also to use ^ on a logical matrix (df >= 4) to return NA for TRUE values and 1 for FALSE, then multiply with original data so that those elements corresponding to NA returns NA and the ones with 1 returns the original element

NA^(df >= 4) * df
akrun
  • 874,273
  • 37
  • 540
  • 662