3

I'm working in R, and I would like to replace all the empty elements of my data.frame with a NA value.

So, if I had this data frame as input:

         unit        delta
1         aaa           696
2         bbb           388
3                       388
4         ccc             0
5         ddd          1630
6         eee             4 

then I would like to have this as output:

         unit         delta
1         aaa           696
2         bbb           388
3        <NA>           388
4         ccc             0
5         ddd          1630
6         eee             4 

How could I do this?

DavideChicco.it
  • 3,318
  • 13
  • 56
  • 84
  • Have you tried `gsub()`, if your blanks are all consistent it'll get the job done. – Badger Oct 13 '15 at 22:05
  • 1
    What's the current strcutre of your data? It would help to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Is `unit` a factor or character? Are the blank values, space? zero-length strings? – MrFlick Oct 13 '15 at 22:05

2 Answers2

7

Regardless of whether it is a character column or a factor column, the is.na<- replacement function has a method to handle either.

is.na(df) <- df == ""

should get the job done fine. For operating only on the unit column, you can do

is.na(df$unit) <- df$unit == ""

Just to check further, we can assign different classes to the different columns and see what happens.

df <- read.csv(text = "unit,delta
bbb,388
,388
ccc,
ddd,1630", colClasses = c("factor", "character"))

df
#   unit delta
# 1  bbb   388
# 2        388
# 3  ccc      
# 4  ddd  1630

is.na(df) <- df == ""
df
#   unit delta
# 1  bbb   388
# 2 <NA>   388
# 3  ccc  <NA>
# 4  ddd  1630

sapply(df, class)
#       unit       delta 
#   "factor" "character" 
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • 2
    Might also worth mentioning that a lot of these problems can be sorted at the input stage by using `na.strings` argument of `read.table` and family – user20650 Oct 14 '15 at 01:07
1

In the Hadleyverse, it'd be something like this:

library(tidyr)
library(dplyr)

d <- YOUR DATA

d %>%
  mutate(unit = replace(unit, unit == '', NA))

  unit delta
1  aaa   696
2  bbb   388
3 <NA>   388
4  ccc     0
5  ddd  1630
6  eee     4
maloneypatr
  • 3,562
  • 4
  • 23
  • 33