Replace missing values (NA) with blank (empty string)

Question

I have a dataframe with an NA row:

 df = data.frame(c("classA", NA, "classB"), t(data.frame(rep("A", 5), rep(NA, 5), rep("B", 5))))
 rownames(df) <- c(1,2,3)
 colnames(df) <- c("class", paste("Year", 1:5, sep = ""))

 > df
   class Year1 Year2 Year3 Year4 Year5
1 classA     A     A     A     A     A
2   <NA>  <NA>  <NA>  <NA>  <NA>  <NA>
3 classB     B     B     B     B     B

I introduced the empty row (NA row) on purpose because I wanted to have some space between classA row and classB row.

Now, I would like to substitute the <NA> by blank, so that the second row looks like an empty row.

I tried:

 df[is.na(df)] <- ""

and

 df[df == "NA"] <- ""

but it didn't work..

Any ideas? Thanks!

Your first attempt works just fine for me. What about it didn't work? — joran, Oct 25 '13 at 14:38
I still see in the dataframe, the code doesn't seem to affect anything — Mayou, Oct 25 '13 at 14:38
It to do with factors (of course!)... try `str(df)` (I jumped the gun on my answer!) — Simon O'Hanlon, Oct 25 '13 at 14:39
Gah. I basically have to forget I'm running `stringsAsFactors = FALSE` once every morning on SO. Listen to Simon. — joran, Oct 25 '13 at 14:39
@SimonO101 Your answer is right on! Factors, I always forget about those.. Thanks! — Mayou, Oct 25 '13 at 14:41
By the way, never just say "it didn't work". You neglected to mention the six (!) warning messages you surely received upon running that code. The warning message should have been awfully suggestive, don't you think? — joran, Oct 25 '13 at 14:42
@Jilber is right really. I typed up an embarssingly wrong answer! Lucky SO doesn't keep the edit history I deleted it so quick! (Hopefully) :-) — Simon O'Hanlon, Oct 25 '13 at 14:43
The brackets around the `` indicate that they are not strings. [Have a look HERE](http://stackoverflow.com/a/16253827/1492421) for more info. — Ricardo Saporta, Oct 25 '13 at 14:45
@RicardoSaporta I should really remember it is that way round considering I upvoted that answer before. — Simon O'Hanlon, Oct 25 '13 at 14:46
I said warning, not error. They are different. And R 3.0.1 most definitely throws 6 warning messages upon running your code. — joran, Oct 25 '13 at 14:50

score 53 · Accepted Answer · answered Oct 25 '13 at 14:38

53

Another alternative:

df <- sapply(df, as.character) # since your values are `factor`
df[is.na(df)] <- 0

If you want blanks instead of zeroes

> df <- sapply(df, as.character)
> df[is.na(df)] <- " "
> df
     class    Year1 Year2 Year3 Year4 Year5
[1,] "classA" "A"   "A"   "A"   "A"   "A"  
[2,] " "      " "   " "   " "   " "   " "  
[3,] "classB" "B"   "B"   "B"   "B"   "B"

If you want a data.frame, then just use as.data.drame

> as.data.frame(df)
   class Year1 Year2 Year3 Year4 Year5
1 classA     A     A     A     A     A
2                                     
3 classB     B     B     B     B     B

answered Oct 25 '13 at 14:38

Jilber Urbina

58,147
10
114
138

1

I thought " " is space and "" is blank. Am i right? – RanonKahn Apr 29 '20 at 17:31
Carefull if you are replacing NAs with blanks (""). the conversion back to data.frame will introduce NAs again. I found that the safest is to replace NAs directly without converting the data frame to a character matrix. – Firefighter1017 Oct 29 '21 at 18:42

score 14 · Answer 2 · answered Oct 25 '13 at 17:55

This answer is more of an extended comment.

What you're trying to do isn't what I would consider good practice. R is not, say, Excel, so doing something like this just to create visual separation in your data is just going to give you a headache later on down the line.

If you really only cared about the visual output, I can offer two suggestions:

Use the na.print argument to print when you want to view the data with that visual separation.

print(df, na.print = "")
#    class Year1 Year2 Year3 Year4 Year5
# 1 classA     A     A     A     A     A
# 2                                     
# 3 classB     B     B     B     B     B

Realize that even the above is not the best suggestion. Get both visual and content separation by converting your data.frame to a list:

split(df, df$class)
# $classA
#    class Year1 Year2 Year3 Year4 Year5
# 1 classA     A     A     A     A     A
# 
# $classB
#    class Year1 Year2 Year3 Year4 Year5
# 3 classB     B     B     B     B     B

for `na.print`to work, the dataframe columns must be character now. if they are not, convert the dataframe by `dplyr::mutate(across(everything(), as.character))` — Agile Bean, Dec 09 '21 at 12:52

score 1 · Answer 3 · answered Apr 02 '23 at 16:06

Here is a dplyr option where you mutate across all the columns (everything()), where you replace in each column (.x) the NA value with an empty space like this:

library(dplyr)
df %>%
  mutate(across(everything(), ~ replace(.x, is.na(.x), "")))
#>    class Year1 Year2 Year3 Year4 Year5
#> 1 classA     A     A     A     A     A
#> 2                                     
#> 3 classB     B     B     B     B     B

^{Created on 2023-04-02 with reprex v2.0.2}

Replace missing values (NA) with blank (empty string)

3 Answers3

Linked

Related