7

I have a data frame where date is stored as a double e.g., 1993.09 1993.10 1993.11 1993.12

I want to convert this into a date format '%Y %m %d' (with days always 1).

As far as I understand, as.Date() wants a string input. However, for some reason when I convert my dates into string sapply(dates, as.character) the zeros after ones disappear, effectively converting October to January, resulting into two Januaries per year.

dates
1993.07 1993.08 1993.09 1993.10 1993.11 1993.12
sapply(dates, as.character)
sub("[.]", " ", dates)
"1993 07" "1993 08" "1993 09" "1993 1"  "1993 11" "1993 12"

Is there a more straightforward way of converting the dates? Or where do I mess up?

dput:

c(1993.01, 1993.02, 1993.03, 1993.04, 1993.05, 1993.06, 1993.07, 
1993.08, 1993.09, 1993.1, 1993.11, 1993.12)
Zlo
  • 1,150
  • 2
  • 18
  • 38

4 Answers4

13

Your problem is that you have something that is a character string, but looks like a numeric and you didn't take care of this during import. R doesn't distinguish between 1993.1 and 1993.10. Both are the same number. Thus, as.character(1993.10) returns "1993.1". You need to use a formating function to make sure that you get two digits after the period, because to as.Date "1993.1" and "1993.01" are the same month.

x <- c(1993.09, 1993.10, 1993.11, 1993.12)
as.Date(sprintf("%.2f.01", x), format = "%Y.%m.%d")
#[1] "1993-09-01" "1993-10-01" "1993-11-01" "1993-12-01"

Of course, x should be imported as a character to begin with.

Roland
  • 127,288
  • 10
  • 191
  • 288
7

If you really do just want to convert it to "Date" class using the first of the month, then Roland's solution seems most direct but there are some other considerations such as whether you might want to use end of month or whether you really want to represent year-months using dates in the first place.

The zoo package has a "yearmon" class which can represent year-months directly without converting them to dates and also has the as.Date.yearmon method which has a frac= argument can be used to specify the fraction of the way through the month to convert to if you do want "Date" class.

First, make sure that the dates are character strings. The input in the question shows 1993.10 as one of the inputs so we must make sure that there is a trailing zero. (If the inputs are already character with the trailing zero then this is not a problem. We have assumed the worst case here assuming numeric so that we need to explicitly convert them to character stings with a trailing 0 if need be.) Now use as.yearmon with format "%Y.%m". Finally use as.Date.yearmon to convert to "Date" class.

Perhaps the biggest advantage of this approach is that we could just leave the result in "yearmon" class (i.e. omit the "as.Date" part, e.g. as.yearmon(sprintf("%.2f", dates)) or if the dates were already character strings, dates.ch, with a trailing 0 in the case of "1993.10" then just as.yearmon(dates.ch, "%Y.%m"), which really represent what you have better since the day is not really meaningful given that it was not there at the beginning. "yearmon" objects can be plotted and sorted in the expected manner.

Here is the conversion to "Date" class using "yearmon" :

library(zoo)

dates <- c(1993.07, 1993.08, 1993.09, 1993.1, 1993.11, 1993.12) # test input 


 as.Date(as.yearmon(sprintf("%.2f", dates), "%Y.%m")) # 1st of month
 ## [1] "1993-07-01" "1993-08-01" "1993-09-01" "1993-10-01" "1993-11-01" "1993-12-01"

 as.Date(as.yearmon(sprintf("%.2f", dates), "%Y.%m"), frac = 1) # last of month
 ## [1] "1993-07-31" "1993-08-31" "1993-09-30" "1993-10-31" "1993-11-30" "1993-12-31"

or if the test input looks like this:

dates.ch <- c("1993.07", "1993.08", "1993.09", "1993.10", "1993.11", "1993.12") # input 

as.Date(as.yearmon(dates.ch, "%Y.%m"))

as.Date(as.yearmon(dates.ch, "%Y.%m"), frac = 1)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

Use paste0 to add the day in and lookup the values for date formatting from ?strptime. If you have trouble with the double to string formatting, you could use formatC:

txtfield <- c(1993.01, 1993.02, 1993.03, 1993.04, 1993.05, 1993.06, 1993.07, 
  1993.08, 1993.09, 1993.1, 1993.11, 1993.12)

as.Date(paste0(formatC(txtfield, digits=2, format="f"),".01"), "%Y.%m.%d")

Explanation:

paste0 is a shorthand version of paste that does not insert spaces between the pasted elements.
in formatC,digits specifies the number of digits you want after the decimal mark (in our case we want 2. format tells R which number formatting to use, in our case "f" gives the numbers numbers in the desired xxx.xxx format.
as.Date converts into a native R date format, with the "%Y.%m.%d" specifying full year (4 digits) followed by a dot, followed by numerical month (2 digits) followed by a dot, followed by numerical day.

results:

[1] "1993-01-01" "1993-02-01" "1993-03-01" "1993-04-01" "1993-05-01" "1993-06-01"
[7] "1993-07-01" "1993-08-01" "1993-09-01" "1993-10-01" "1993-11-01" "1993-12-01"
Serban Tanasa
  • 3,592
  • 2
  • 23
  • 45
0

You'll need to do some fiddling with strings. The most obvious way (to me*) would be to "pad" the right side of the values with zeroes.

* that's a pretty big caveat

dates <- c(1993.01, 1993.02, 1993.03, 1993.04, 1993.05, 1993.06, 1993.07, 
1993.08, 1993.09, 1993.10, 1993.11, 1993.12)

library(magrittr)
library(stringr)
dates %<>%
  str_pad(width = 7, side = "right", pad = "0") %>%
  paste0(".01") %>%
  as.Date(format = "%Y.%m.%d")

dates
Benjamin
  • 16,897
  • 6
  • 45
  • 65