I think what you need is ifelse
:
x = c("2012-01-01", "13/01/2012", "")
x1 = as.Date(x, format="%Y-%m-%d")
x2 = as.Date(x, format="%d/%m/%Y")
y = as.Date(
ifelse(!is.na(x1), as.Date(x1),
ifelse(!is.na(x2), x2, x1)),
origin = as.Date("1970/01/01")
)
y
[1] "2012-01-01" "2012-01-13" NA
For some reason I had to wrap the ifelse
into as.Date()
, otherwise the dates were printed as numbers, and not as actual dates. The origin
argument is needed in this case.
Edit
Here is how the above code works. ifelse(condition, value_if_true, value_if_false)
is a vectorized function that expects three vectors of the same length as arguments. (The way R works, if one of these arguments is a single value, it will automatically be repeated proper number of times, the term for this is "broadcasting".) The execution of ifelse()
is as follows:
a) For every element in condition
, check whether it is TRUE
or FALSE
.
b) If condition
is TRUE
, take the corresponding value from value_if_true
vector.
c) Otherwise, if condition
is FALSE
, take the corresponding value from value_if_false
vector
Now, you can nest ifelse()
into each other, this way allowing for a three-way condition checking or even more if needed. Be warned that nesting more than a few ifelse
calls can get really messy really fast.
With this knowledge, the above code should be straightforward to parse:
1) x
creates an example vector of dates stored as strings with different formats.
2) x1
parses the first date correctly, and fails to parse the second date.
3) x2
parses the second date format correctly, but fails to parse the first date.
4) Then two nested ifelse
calls combine correctly parsed dates into a single vector. First ifelse
checks if the first date format was correctly parsed (!is.na(x)
means "whenever value of x
is not NA
"), and returns non-missing values from x1
. If x1
is missing, it calls second ifelse
, which returns non-missing values from x2
, and then, if x2
is also missing, it returns missing values into the final result.
5) For some reason, the nested ifelse
calls return dates formatted as numbers rather than strings, so I wrap the result of 4) into as.Date
to get nicely formatted dates back. In R, when you attempt to convert numbers to dates, you must provide the origin
- dates are interpreted as number of days passed since some pre-defined origin date, which in R is Jan 1, 1970.
Hope this helps.