Replacing character values with NA in a data frame

Question

I have a data frame containing (in random places) a character value (say "foo") that I want to replace with a NA.

What's the best way to do so across the whole data frame?

Don't forget to redefine your column as.numeric() switching a few characters from "foo" to NA won't coerce the whole set to numeric. You have to force it. (If that's what you're doing) — Brandon Bertelsen, Jul 28 '10 at 22:15

score 126 · Accepted Answer · edited Oct 20 '22 at 02:14

126

This:

df[df == "foo"] <- NA

edited Oct 20 '22 at 02:14

ah bon

9,293
12
65
148

answered Jul 28 '10 at 21:47

c-urchin

4,344
6
28
30

21

Note that if you were trying to replace NA with "foo", the reverse (`df[ df == NA ] = "foo"`) will not work; you would need to use `df[is.na(df)] <- "foo"` – Andy Barbour May 08 '13 at 22:05
If you have datetime columns in your dataframe, you may get an error similar to the following: "Error in as.POSIXlt.character(x, tz, ...) : character string is not in a standard unambiguous format". – Piethon Apr 27 '23 at 04:23

score 77 · Answer 2 · answered Jul 28 '10 at 21:49

77

One way to nip this in the bud is to convert that character to NA when you read the data in in the first place.

df <- read.csv("file.csv", na.strings = c("foo", "bar"))

answered Jul 28 '10 at 21:49

JoFrhwld

8,867
4
37
32

camille · Answer 3 · 2021-10-30T21:20:45.193

Using dplyr::na_if, you can replace specific values with NA. In this case, that would be "foo".

library(dplyr)
set.seed(1234)

df <- data.frame(
  id = 1:6,
  x = sample(c("a", "b", "foo"), 6, replace = T),
  y = sample(c("c", "d", "foo"), 6, replace = T),
  z = sample(c("e", "f", "foo"), 6, replace = T),
  stringsAsFactors = F
)
df
#>   id   x   y   z
#> 1  1   a   c   e
#> 2  2   b   c foo
#> 3  3   b   d   e
#> 4  4   b   d foo
#> 5  5 foo foo   e
#> 6  6   b   d   e

na_if(df$x, "foo")
#> [1] "a" "b" "b" "b" NA  "b"

If you need to do this for multiple columns, you can pass "foo" through from mutate with across (updated for dplyr v1.0.0+).

df %>%
  mutate(across(c(x, y, z), na_if, "foo"))
#>   id    x    y    z
#> 1  1    a    c    e
#> 2  2    b    c <NA>
#> 3  3    b    d    e
#> 4  4    b    d <NA>
#> 5  5 <NA> <NA>    e
#> 6  6    b    d    e

Axeman · Answer 4 · 2017-04-27T13:39:52.297

5

Another option is is.na<-:

is.na(df) <- df == "foo"

Note that its use may seem a bit counter-intuitive, but it actually assigns NA values to df at the index on the right hand side.

edited Apr 27 '17 at 13:39

answered Apr 27 '17 at 13:23

Axeman

32,068
8
81
94

2

or the same `'is.na<-'(df, df=="foo")` – jogo Apr 27 '17 at 14:09

sbha · Answer 5 · 2019-03-27T01:43:18.487

This could be done with dplyr::mutate_all() and replace:

library(dplyr)
df <- data_frame(a = c('foo', 2, 3), b = c(1, 'foo', 3), c = c(1,2,'foobar'),  d = c(1, 2, 3))

> df
# A tibble: 3 x 4
     a     b      c     d
  <chr> <chr>  <chr> <dbl>
1   foo     1      1     1
2     2   foo      2     2
3     3     3 foobar     3


df <- mutate_all(df, funs(replace(., .=='foo', NA)))

> df
# A tibble: 3 x 4
      a     b      c     d
  <chr> <chr>  <chr> <dbl>
1  <NA>     1      1     1
2     2  <NA>      2     2
3     3     3 foobar     3

Another dplyr option is:

df <- na_if(df, 'foo')

score 2 · Answer 6 · answered Jul 31 '20 at 10:53

Assuming you do not know the column names or have large number of columns to select, is.character() might be of use.

df <- data.frame(
  id = 1:6,
  x = sample(c("a", "b", "foo"), 6, replace = T),
  y = sample(c("c", "d", "foo"), 6, replace = T),
  z = sample(c("e", "f", "foo"), 6, replace = T),
  stringsAsFactors = F
)
df
#   id   x   y   z
# 1  1   b   d   e
# 2  2   a foo foo
# 3  3   a   d foo
# 4  4   b foo foo
# 5  5 foo foo   e
# 6  6 foo foo   f

df %>% 
  mutate_if(is.character, list(~na_if(., "foo")))
#   id    x    y    z
# 1  1    b    d    e
# 2  2    a <NA> <NA>
# 3  3    a    d <NA>
# 4  4    b <NA> <NA>
# 5  5 <NA> <NA>    e
# 6  6 <NA> <NA>    f

score 0 · Answer 7 · edited Jun 24 '16 at 00:01

0

One alternate way to solve is below:

for (i in 1:ncol(DF)){
  DF[which(DF[,i]==""),columnIndex]<-"ALL"
  FinalData[which(is.na(FinalData[,columnIndex])),columnIndex]<-"ALL"
}

edited Jun 24 '16 at 00:01

Hobo

7,536
5
40
50

answered Feb 18 '16 at 16:56

Abhi

21
3

Replacing character values with NA in a data frame

7 Answers7

Linked

Related