1

I am trying to replace default "1900-01-01" for all column name that matches with date and DOB with na value. Below are the code.

library(dplyr)

data[is.na(data),] %>% select(matches("date"),DOB) %>% head()

FirstTradeDate LastTradeDate DOB
NA <NA> <NA> <NA>
NA.1 <NA> <NA> <NA>
NA.2 <NA> <NA> <NA>
NA.3 <NA> <NA> <NA>
NA.4 <NA> <NA> <NA>
NA.5 <NA> <NA> <NA>

Is this possible to replace with dplyr package? I have tried mutate and transmutate both are not working. AnyIdea?

I also tried

data[is.na(data),c(grep("[Dd]ate|DOB",names(data)))]<-as.Date("1970-01-01")

Error in `[<-.data.frame`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB",  : 
  non-existent rows not allowed

traceback()

3: stop("non-existent rows not allowed")
2: `[<-.data.frame`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB", 
       names(data))), value = 0)
1: `[<-`(`*tmp*`, is.na(data), c(grep("[Dd]ate|DOB", names(data))), 
       value = 0)

Is there any other approach?

oguz ismail
  • 1
  • 16
  • 47
  • 69
Karam Chand
  • 63
  • 1
  • 7
  • 3
    Please provide a [**minimal, self contained example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) with sample data and the desired output. – Henrik Dec 09 '14 at 09:29

1 Answers1

1

You could use base R

indx <- grep('DOB|Date', colnames(dat1))
dat1[indx][dat1[indx]=='1900-01-01'] <- NA
head(dat1)
#  FirstTradeDate LastTradeDate        DOB        Val
#1     1900-01-03          <NA> 1900-01-04  0.8804658
#2           <NA>    1900-01-03 1900-01-05 -1.6254185
#3     1900-01-02    1900-01-02       <NA> -0.3863661
#4     1900-01-04    1900-01-05       <NA>  0.6196153
#5           <NA>    1900-01-03       <NA> -1.1075294
#6     1900-01-05    1900-01-04 1900-01-03 -0.9048846

Or using data.table

library(data.table) 
DT <- setDT(dat1)[, lapply(.SD, function(x) replace(x,
                       x=='1900-01-01', NA)), .SDcols=indx]

Or using dplyr

library(dplyr)
dat1 %>%
   mutate_each(funs(replace(., .=='1900-01-01', NA)),matches('Date'), DOB)  %>%
   head()
#  FirstTradeDate LastTradeDate        DOB        Val
#1     1900-01-03          <NA> 1900-01-04  0.8804658
#2           <NA>    1900-01-03 1900-01-05 -1.6254185
#3     1900-01-02    1900-01-02       <NA> -0.3863661
#4     1900-01-04    1900-01-05       <NA>  0.6196153
#5           <NA>    1900-01-03       <NA> -1.1075294
#6     1900-01-05    1900-01-04 1900-01-03 -0.9048846

Update

The data.table method can also be done in a loop (as mentioned by @Arun)

   nm1 <- names(dat1)[indx]
   DT <- data.table(dat1)
   for(nm in nm1){
     setkeyv(DT, nm)
     DT[J(as.Date('1900-01-01')), (nm):=NA]
   }

Alternatively using for-loop with set:

require(data.table)
indx = grep("DOB|Date", names(dat1), value=TRUE)
setDT(dat1)
for (j in indx) {
    thiscol = dat1[[j]]
    set(dat1, i = which(thiscol == "1900-01-01"), j = j, value = NA)
}

data

Dates <- seq(as.Date('1900-01-01'), length.out=5, by='1 day')
set.seed(248)
dat1 <- data.frame(FirstTradeDate=sample(Dates,100,replace=TRUE),
  LastTradeDate=sample(Dates,100, replace=TRUE), DOB=sample(Dates,100,
     replace=TRUE), Val=rnorm(100))
Arun
  • 116,683
  • 26
  • 284
  • 387
akrun
  • 874,273
  • 37
  • 540
  • 662
  • nice `DT` solution, but the idiomatic way would be to use a for-loop along with `set`, and modify by reference, unless OP wants to keep both data sets.. – Arun Dec 09 '14 at 10:20
  • @Arun I updated with a `for` loop method. Not sure if this is what you have in mind. – akrun Dec 10 '14 at 11:35
  • 1
    It's different from what I had in mind, but is great as well. The `setkey` can be moved outside of the for-loop. I added the version had in mind as well. – Arun Dec 10 '14 at 12:37
  • @arun Many thanks for your prompt respond. I tried your code. It worked for me. I am new to stackoverflow. I tried to vote up ur solution but I can't. It says you need 15 reputaion to vote up. – Karam Chand Feb 09 '15 at 11:51