0

I have a data frame with columns that contain duplicate information and gaps. For example, lets say the data frame has both START_DATE and BEGIN_DATE. They both represent the same thing. The data looks like this:

START_DATE  BEGIN_DATE
----------  ----------
NA          10/10/2011
NA          12/12/2011
9/4/2011    9/4/2011
3/22/2014   3/22/2014
5/5/2011    NA

I want:

DATE
-------
10/10/2011
12/12/2011
9/4/2011
3/22/2014
5/5/2011

This doesn't work for a couple of reasons:

transform(df, DATE = if(is.na(START_DATE)) BEGIN_DATE else START_DATE)

What is the right way to do this in R?

ahoffer
  • 6,347
  • 4
  • 39
  • 68
  • possible duplicate http://stackoverflow.com/questions/11865195/using-if-else-on-a-data-frame – mlt Jun 17 '14 at 21:16

3 Answers3

1

This will handle factors correctly:

with(dat, pmin(as.character(START_DATE) , as.character(BEGIN_DATE), na.rm=TRUE))
IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Most database implementations have a function called coalesce for this. Sadly this is missing in base R, however I have created one. For just two columns it may be over kill, but if you have more, it will work well, or if you want to supply a default if all are missing. Plus, this method will preserve the Date class

This code is available here: coalesce.R

And you would use it like

d1<-c(as.Date("2011-10-10"), NA, as.Date("2011-09-04"))
d2<-c(as.Date("2011-10-10"), as.Date("2011-12-12"), NA)

coalesce(d1,d2)   
# [1] "2011-10-10" "2011-12-12" "2011-09-04"

If you have devtools installed, you can automatically source this gist with

library(devtools)
source_gist(10205794)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

use ifelse:

transform(df, DATE = ifelse(is.na(START_DATE), BEGIN_DATE, START_DATE))

since ifelse converts dates to numeric, we have to do some extra stuff:

transform(df, DATE = as.Date(ifelse(is.na(START_DATE), as.character(BEGIN_DATE), as.character(START_DATE))))
Community
  • 1
  • 1
rrs
  • 9,615
  • 4
  • 28
  • 38
  • 1
    This seems to make DATE a numeric column. Perhaps `transform(df, DATE = as.Date(ifelse(is.na(START_DATE), BEGIN_DATE, START_DATE), origin = "1970-01-01"))` instead? – Kara Woo Jun 17 '14 at 21:38
  • It only makes a numeric column if `START_DATE` and `BEGIN_DATE` are factors. – rrs Jun 17 '14 at 21:40
  • In my image, the new column DATE was numeric. I loaded the package "zoo" to avoid the date offsets and tried: transform(df, DATE = as.Date(ifelse(is.na(START_DATE), CLOSED_DATE, START_DATE))), but my image hangs and I have to kill RStudio. – ahoffer Jun 17 '14 at 21:59
  • Ah, I used characters for the dates. It looks like `ifelse` [converts dates to numeric](http://stackoverflow.com/questions/6668963/how-to-prevent-ifelse-from-turning-date-objects-into-numeric-objects). If you are okay with character dates, then this answer works. You can always convert to date after the `ifelse`. – rrs Jun 17 '14 at 22:02
  • I'm new to R, and I'm confused. The dates in the original columns are POSIXct. Using the "ifelse" statement in the transform creates a value of 1382572800 for the date 2013-10-24. The function as.Date doesn't convert this date into a sensible value (3787325-10-04). I'm confused. – ahoffer Jun 17 '14 at 22:23
  • Success! This is ugly but it works: transform(df, DATE = as.Date(ifelse(is.na(START_DATE), as.character(BEGIN_DATE), as.character(START_DATE)))) – ahoffer Jun 17 '14 at 22:28