How do you replace empty cells with 0?

Question

I need to replace empty cells with zero (0) in R. I have a data frame like this:

dput(df)

structure(list(CHANNEL = structure(c(1L, 1L, 1L), .Label = "Native BlackBerry App", class = "factor"), 
    DATE = structure(c(1L, 1L, 1L), .Label = "01/01/2011", class = "factor"), 
    HOUR = structure(c(3L, 1L, 2L), .Label = c("1:00am-2:00am", 
    "2:00am-3:00am", "Midnight-1:00am"), class = "factor"), UNIQUE_USERS = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor"), LOGON_VOLUME = structure(c(1L, 
    1L, 1L), .Label = "", class = "factor")), .Names = c("CHANNEL", 
"DATE", "HOUR", "UNIQUE_USERS", "LOGON_VOLUME"), row.names = c(NA, 
-3L), class = "data.frame")

I have this function:

sapply(df, function (x) 
     as.numeric(gsub("(^ +)|( +$)", "0", x)))

I get these errors, not working.

[ reached getOption("max.print") -- omitted 422793 rows ]
Warning messages:
1: In FUN(X[[4L]], ...) : NAs introduced by coercion
2: In FUN(X[[4L]], ...) : NAs introduced by coercion
3: In FUN(X[[4L]], ...) : NAs introduced by coercion
4: In FUN(X[[4L]], ...) : NAs introduced by coercion

update: when I apply this function to df:

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) )

I get this:

  CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" ""           ""          
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   ""           ""          
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   ""           ""

`Warning` != `Error`. If you run `as.numeric('x')` you'll see the same warning. — Justin, Sep 11 '13 at 20:31
Please read this: http://stackoverflow.com/q/5963269/1003565 — Dason, Sep 11 '13 at 20:32
@user1471980 read the whole answer. You can use `head` to post a small portion. — Señor O, Sep 11 '13 at 20:37
why is this post being down voted, this is a good question, I really tried fixing it but not able to come to a solution. If you cannot help, dont bother. — user1471980, Sep 11 '13 at 20:46
Why are you using `sapply`? That's just going to return a vector or array. — Señor O, Sep 11 '13 at 20:49

Simon O'Hanlon · Accepted Answer · 2013-09-11T21:27:53.213

You define an anonymous function in sapply then never use the argument to the function.

sapply(df, function (x) gsub("(^ +)|( +$)", "0", x) ) #===> change df to x

You also coerce everything to a numeric value resulting in NA values for strings with non digits in. Since each column of the data.frame is an atomic vector it can only contain one type of data. The common data type for all elements is therefore character.

Perhaps you meant to do this...

sapply( df , gsub , pattern = "^\\s*$" , replacement = 0 )

     CHANNEL                 DATE         HOUR              UNIQUE_USERS LOGON_VOLUME
[1,] "Native BlackBerry App" "01/01/2011" "Midnight-1:00am" "0"          "0"         
[2,] "Native BlackBerry App" "01/01/2011" "1:00am-2:00am"   "0"          "0"         
[3,] "Native BlackBerry App" "01/01/2011" "2:00am-3:00am"   "0"          "0"

Using gsub you'll have to convert to an integer afterwards and you will also get NA for any column which contains something other than a character representation of a number. If you need to change entire columns you could check if the entire column is empty and replace with zero if it is. You can't have character elements and numeric elements in the same column.

len <- colSums( sapply( df , grepl , pattern = "^\\s*$" ) )    
df[ , len > 0 ] <- rep( 0 , nrow(df) )
#                CHANNEL       DATE            HOUR UNIQUE_USERS LOGON_VOLUME
#1 Native BlackBerry App 01/01/2011 Midnight-1:00am            0            0
#2 Native BlackBerry App 01/01/2011   1:00am-2:00am            0            0
#3 Native BlackBerry App 01/01/2011   2:00am-3:00am            0            0

I am trying to do dput(head(df,2) get a subset of a huge df, but for some reason it is not working. It is giving me the whole df output in dput format. I followed your suggestions without any success. — user1471980, Sep 11 '13 at 21:03
@user1471980 It should give you the first two rows, assuming `df` really is a `data.frame`? `class( df )`? — Simon O'Hanlon, Sep 11 '13 at 21:08
@user1471980 Ok. You can't have character and numeric elements in the same column. — Simon O'Hanlon, Sep 11 '13 at 21:28

How do you replace empty cells with 0?

1 Answers1