I have a data frame containing (in random places) a character value (say "foo"
) that I want to replace with a NA
.
What's the best way to do so across the whole data frame?
I have a data frame containing (in random places) a character value (say "foo"
) that I want to replace with a NA
.
What's the best way to do so across the whole data frame?
One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.
df <- read.csv("file.csv", na.strings = c("foo", "bar"))
Using dplyr::na_if
, you can replace specific values with NA
. In this case, that would be "foo"
.
library(dplyr)
set.seed(1234)
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
#> id x y z
#> 1 1 a c e
#> 2 2 b c foo
#> 3 3 b d e
#> 4 4 b d foo
#> 5 5 foo foo e
#> 6 6 b d e
na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA "b"
If you need to do this for multiple columns, you can pass "foo"
through from mutate
with across
(updated for dplyr
v1.0.0+).
df %>%
mutate(across(c(x, y, z), na_if, "foo"))
#> id x y z
#> 1 1 a c e
#> 2 2 b c <NA>
#> 3 3 b d e
#> 4 4 b d <NA>
#> 5 5 <NA> <NA> e
#> 6 6 b d e
Another option is is.na<-
:
is.na(df) <- df == "foo"
Note that its use may seem a bit counter-intuitive, but it actually assigns NA
values to df
at the index on the right hand side.
This could be done with dplyr::mutate_all()
and replace
:
library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'), d = c(1, 2, 3))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 foo 1 1 1
2 2 foo 2 2
3 3 3 foobar 3
df <- mutate_all(df, funs(replace(., .=='foo', NA)))
> df
# A tibble: 3 x 4
a b c d
<chr> <chr> <chr> <dbl>
1 <NA> 1 1 1
2 2 <NA> 2 2
3 3 3 foobar 3
Another dplyr
option is:
df <- na_if(df, 'foo')
Assuming you do not know the column names or have large number of columns to select, is.character()
might be of use.
df <- data.frame(
id = 1:6,
x = sample(c("a", "b", "foo"), 6, replace = T),
y = sample(c("c", "d", "foo"), 6, replace = T),
z = sample(c("e", "f", "foo"), 6, replace = T),
stringsAsFactors = F
)
df
# id x y z
# 1 1 b d e
# 2 2 a foo foo
# 3 3 a d foo
# 4 4 b foo foo
# 5 5 foo foo e
# 6 6 foo foo f
df %>%
mutate_if(is.character, list(~na_if(., "foo")))
# id x y z
# 1 1 b d e
# 2 2 a <NA> <NA>
# 3 3 a d <NA>
# 4 4 b <NA> <NA>
# 5 5 <NA> <NA> e
# 6 6 <NA> <NA> f