3

Question: how can I convert a factor to a date object without getting NA values.

Here's a similar post: Convert Factor to Date/Time in R

In that post, the user converted to a character object before a date. I am getting NA values when converting to character object using as.character inside the as.Date function.

I have a column in the dataframe with the date in factor format with different numbers of occurrences. Here's the information contained in the data.frame.

> head(fraud, 5)
  TRANSACTION.DATE TRANSACTION.AMOUNT AIR.TRAVEL.DATE POSTING.DATE
1 2/27/14                  25.00                 <NA>          2/28/14
2 2/28/14                  25.00                 <NA>          2/28/14
3 2/27/14                  25.00                 <NA>          2/28/14
4 2/27/14                  20.00              2/27/14          2/28/14
5 2/27/14                  12.13                 <NA>          2/28/14

> str(fraud$TRANSACTION.DATE)
 Factor w/ 519 levels "1/1/14","1/1/15",..: 228 230 228 228 228 230 226 228 230 228 ...

> summary(fraud$TRANSACTION.DATE, 5)
9/30/14 9/17/14 11/4/14 9/23/14 (Other) 
    197     187     171     160   19221 

Converting the factor to a date object resulted in NA values.

> fraud$TRANSACTION.DATE <- as.Date(as.character(fraud$TRANSACTION.DATE), 
+                                       format = "%m/%d/%Y")
> head(fraud$TRANSACTION.DATE, 5)
[1] NA NA NA NA NA

Checking if the as.character function worked.

> fraud$TRANSACTION.DATE <- as.character(fraud$TRANSACTION.DATE)
> head(fraud$TRANSACTION.DATE)
[1] NA NA NA NA NA NA

EDIT: I used as.Date function but got the wrong formatting

> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, format = "%m/%d/%Y")
> str(fraud$TRANSACTION.DATE)
 Date[1:19936], format: "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27" ...
> head(fraud$TRANSACTION.DATE, 5)
[1] "0014-02-27" "0014-02-28" "0014-02-27" "0014-02-27" "0014-02-27"

EDIT 2: Here's the dput value

> dput(droplevels(head(fraud$TRANSACTION.DATE)))
structure(c(1L, 2L, 1L, 1L, 1L, 2L), .Label = c("2/27/14", "2/28/14"
), class = "factor")

Solution: using %y instead of %Y

> fraud$TRANSACTION.DATE <- as.Date(fraud$TRANSACTION.DATE, "%m/%d/%y")
> head(fraud$TRANSACTION.DATE, 5)
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
Community
  • 1
  • 1
Scott Davis
  • 983
  • 6
  • 22
  • 43
  • 3
    You are using `$d` instead of `%d` . Also `as.Date` will be enough i.e. as.character may not be needed. i.e. `as.Date(fraud$TRANSACTION.DATE, '%m/%d/%y') #[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"` – akrun Jul 28 '15 at 15:56
  • I had seen your edit, but using the data copied from your post, I am not getting any NAs. It may be better if you use `dput` to show the data for reproducibility. – akrun Jul 28 '15 at 16:07
  • I performed the edit, but still got NA values when using the as.Date function without as.character. I fixed the syntax as well. Is there anyway I give another view of my data? It seems that I am not providing a good enough reproducible example. – Scott Davis Jul 28 '15 at 16:10
  • 1
    a dput output may be more reproducible. i.e. `dput(droplevels(head(fraud)))` – akrun Jul 28 '15 at 16:10
  • I think one needs `as.character` as otherwise the numeric values of the factor are used, i.e., `1:nlevels(fraud$TRANSACTION.DATE)`. Yet, I do not understand why `as.character` isn't working as desired. – Benjamin Hofner Jul 28 '15 at 16:13
  • Probably, you can also check [here](http://stackoverflow.com/questions/15566875/as-date-returning-na-in-r) – akrun Jul 28 '15 at 16:14
  • 1
    @BenjaminHofner I did try with `factor` column and it worked well without converting to `character`. I am using R 3.2.1 – akrun Jul 28 '15 at 16:15
  • @akrun: True. Nice move of the `as.Date` function. – Benjamin Hofner Jul 28 '15 at 16:17
  • I made another edit, the as.Date function worked and converted the factor to a date object. The mistake earlier was not reloaded the dataset before applying as.Date. Now, the only problem is the formatting for date. – Scott Davis Jul 28 '15 at 16:17
  • @ScottDavis I am not sure how you are getting different date value. – akrun Jul 28 '15 at 16:20
  • @akrun I put another edit with the dput value. Does that help? – Scott Davis Jul 28 '15 at 16:23
  • It is working fine for me. If v1 is the object. `as.Date(v1, '%m/%d/%y')# [1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27" [6] "2014-02-28"` It must be related to something in the `setlocale` as in the link – akrun Jul 28 '15 at 16:25
  • There is something odd here. Please provide the output suggest by @akrun plus `str(fraud)`. I can't reproduce the problem with the Dates returning `NA`s with the same dates as you showed. – Gavin Simpson Jul 28 '15 at 16:27
  • 1
    @BenjaminHofner You don't need `as.character()` as there is an S3 method for class `"factor"` which effectively calls `as.Date(as.character(x))` for you. – Gavin Simpson Jul 28 '15 at 16:29
  • @GavinSimpson I am not getting the NA values anymore after akrun's suggestion. Now, I am getting a different format for the dates than "2014-02-27". – Scott Davis Jul 28 '15 at 16:31
  • 1
    @ScottDavis To format the dates (R will use the YYYY-MM-DD for *display*) look at `strftime()` for details. Note however that you want to keep it as a `"Date"` object and only format when needed for plotting or in tables that you output. – Gavin Simpson Jul 28 '15 at 16:31
  • 1
    Change to using `%y` instead of `%Y` and you should be OK. What happens with `%Y` when you don't have year with century seems OS dependent. I see something different on Linux (Fedora 22) for example. Adding longer answer now. – Gavin Simpson Jul 28 '15 at 16:33
  • @GavinSimpson, you are saying now that I have it in a date object, I can change the format to YYYY-MM-DD ? Ok, I'll try it. – Scott Davis Jul 28 '15 at 16:33
  • 1
    @ScottDavis Normally yes, but your problem is that you are still using the wrong format placeholders. See my answer or previous comment: You want a lowercase `%y`. – Gavin Simpson Jul 28 '15 at 16:36

1 Answers1

5

The problem now is that your format string states the dates include the year with century where your dates only contain the year without century. You need to use the %y placeholder, not the %Y one.

dates <- factor(c("2/27/14","2/28/14","2/27/14","2/27/14","2/27/14"))
as.Date(dates, format = "%m/%d/%y") # correct lowercase y
as.Date(dates, format = "%m/%d/%Y") # incorrect uppercase y

> as.Date(dates, format = "%m/%d/%y")
[1] "2014-02-27" "2014-02-28" "2014-02-27" "2014-02-27" "2014-02-27"
> as.Date(dates, format = "%m/%d/%Y")
[1] "14-02-27" "14-02-28" "14-02-27" "14-02-27" "14-02-27"

Notice R gets it right when you use the correct placeholder; lowercase y.

What happens with %Y when you don't have a year with century seems OS dependent. As you can see on Linux (Fedora 22) I get no padding of the year part whereas you are seeing zero-padding.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453