How to use as.Date if there is NA, empty cells, and different formats within the same column?

Question

Inside my data frame, a column consists of two kinds of date formats (%Y-%m-%d and %d/%m/%Y), empty cells, and NA. The following method is my attempt which did not produce the desired outcome. I tried to convert them as follow....

a <- as.Date(data$x, "%Y-%m-%d")
b <- as.Date(data$x, "%d/%m/%Y")

Then combine them together...

a[is.na(a)] <- b[!is.na(b)]

However, I get an error saying something about the different number of rows. I trust its due to the existence of empty cells and NA. Is there a way where I can put a + b together according to their row number/observation/ranking?

Thank you in advance for answering.

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, Feb 12 '18 at 08:15
What do you want empty cells ('') and NA translated to? Also, please add reproducible data (use `dput()`) — smci, Feb 12 '18 at 08:17
Thanks for the comment guys. @smci I have no preference so long the NA and empty cells does not affect me manipulate the data. What would you have done if you faced the same problem? I tried changing it to "0000" but realized that they get changed back to NA when I apply as.Date to the data. — Teddy, Feb 13 '18 at 03:04
Well the old-and-trusted way of changing blank or NA dates such that they will be processed as a date and not result in variable-length output would be to change them to some sentinel value which is clearly meaningless, e.g. `9/9/9999`. But still please add say 10 example records of reproducible data. — smci, Feb 13 '18 at 04:10
Will do, please give me some time to understand how to share a reproducible data as I am still trying to understand @kgolyaev's code below. Already spent the morning trying to break it down and understand it but am finding it challenging. Im taking this seriously as I really want to understand it badly. Thanks for being patient. — Teddy, Feb 13 '18 at 05:46

kgolyaev · Answer 1 · 2018-02-13T18:21:23.213

I think what you need is ifelse:

x = c("2012-01-01", "13/01/2012", "")
x1 = as.Date(x, format="%Y-%m-%d")
x2 = as.Date(x, format="%d/%m/%Y")
y = as.Date(
  ifelse(!is.na(x1), as.Date(x1), 
         ifelse(!is.na(x2), x2, x1)), 
  origin = as.Date("1970/01/01")
)
y
[1] "2012-01-01" "2012-01-13" NA

For some reason I had to wrap the ifelse into as.Date(), otherwise the dates were printed as numbers, and not as actual dates. The origin argument is needed in this case.

Edit

Here is how the above code works. ifelse(condition, value_if_true, value_if_false) is a vectorized function that expects three vectors of the same length as arguments. (The way R works, if one of these arguments is a single value, it will automatically be repeated proper number of times, the term for this is "broadcasting".) The execution of ifelse() is as follows:

a) For every element in condition, check whether it is TRUE or FALSE.

b) If condition is TRUE, take the corresponding value from value_if_true vector.

c) Otherwise, if condition is FALSE, take the corresponding value from value_if_false vector

Now, you can nest ifelse() into each other, this way allowing for a three-way condition checking or even more if needed. Be warned that nesting more than a few ifelse calls can get really messy really fast.

With this knowledge, the above code should be straightforward to parse:

1) x creates an example vector of dates stored as strings with different formats.

2) x1 parses the first date correctly, and fails to parse the second date.

3) x2 parses the second date format correctly, but fails to parse the first date.

4) Then two nested ifelse calls combine correctly parsed dates into a single vector. First ifelse checks if the first date format was correctly parsed (!is.na(x) means "whenever value of x is not NA"), and returns non-missing values from x1. If x1 is missing, it calls second ifelse, which returns non-missing values from x2, and then, if x2 is also missing, it returns missing values into the final result.

5) For some reason, the nested ifelse calls return dates formatted as numbers rather than strings, so I wrap the result of 4) into as.Date to get nicely formatted dates back. In R, when you attempt to convert numbers to dates, you must provide the origin - dates are interpreted as number of days passed since some pre-defined origin date, which in R is Jan 1, 1970.

Hope this helps.

Your method worked although I dont understand the reasoning behind it as im still relatively new to R and havent gotten a grasp of most concepts, but major thanks!! If you have the time, could you explain the usage of ifelse here? Thank you in advance. — Teddy, Feb 13 '18 at 03:08
Appreciate your time and effort to explain this to me. Youre a good person! Thanks a heap! — Teddy, Feb 20 '18 at 09:50

How to use as.Date if there is NA, empty cells, and different formats within the same column?

1 Answers1