109

Question:

Is there a way to specify the Date format when using the colClasses argument in read.table/read.csv?

(I realise I can convert after importing, but with many date columns like this, it would be easier to do it in the import step)


Example:

I have a .csv with date columns in the format %d/%m/%Y.

dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))

This gets the conversion wrong. For example, 15/07/2008 becomes 0015-07-20.


Reproducible code:

data <- 
structure(list(func_loc = structure(c(1L, 2L, 3L, 3L, 3L, 3L, 
3L, 4L, 4L, 5L), .Label = c("3076WAG0003", "3076WAG0004", "3076WAG0007", 
"3076WAG0009", "3076WAG0010"), class = "factor"), order_type = structure(c(3L, 
3L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 1L), .Label = c("PM01", "PM02", 
"PM03"), class = "factor"), actual_finish = structure(c(4L, 6L, 
1L, 2L, 3L, 7L, 1L, 8L, 1L, 5L), .Label = c("", "11/03/2008", 
"14/08/2008", "15/07/2008", "17/03/2008", "19/01/2009", "22/09/2008", 
"6/09/2007"), class = "factor")), .Names = c("func_loc", "order_type", 
"actual_finish"), row.names = c(NA, 10L), class = "data.frame")


write.csv(data,"data.csv", row.names = F)                                                        

dataImport <- read.csv("data.csv")
str(dataImport)
dataImport

dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
dataImport

And here's what the output looks like:

code output

smci
  • 32,567
  • 20
  • 113
  • 146
Tommy O'Dell
  • 7,019
  • 13
  • 56
  • 69
  • A hackish way to do this would be to create your own version of `read.table` and add a `format` argument that is passed on to `as.Date`. I wouldn't be surprised if there's a better way that I'm not thinking of, though. – joran Oct 23 '12 at 01:32

4 Answers4

169

You can write your own function that accepts a string and converts it to a Date using the format you want, then use the setAs to set it as an as method. Then you can use your function as part of the colClasses.

Try:

setAs("character","myDate", function(from) as.Date(from, format="%d/%m/%Y") )

tmp <- c("1, 15/08/2008", "2, 23/05/2010")
con <- textConnection(tmp)

tmp2 <- read.csv(con, colClasses=c('numeric','myDate'), header=FALSE)
str(tmp2)

Then modify if needed to work for your data.

Edit ---

You might want to run setClass('myDate') first to avoid the warning (you can ignore the warning, but it can get annoying if you do this a lot and this is a simple call that gets rid of it).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • 4
    Note that you might get a 'no definition for class "myDate"' warning as detailed in [this question](http://stackoverflow.com/questions/14146341/using-setas-to-specify-colclasses-in-r). – Danny D'Amours Jan 10 '13 at 20:35
  • 1
    What is `setMethod('myDate')` supposed to do? Running it just gives me an error... – Josh O'Brien Aug 30 '13 at 17:10
  • 1
    @JoshO'Brien, sorry that should have been `setClass` (fixed now). What it does is prevent `setAs` from issuing a warning about 'myDate' not existing as a class. The warning is harmless and everything still works, but setting the class means that you don't even see the warning. – Greg Snow Aug 30 '13 at 17:47
  • Is it better to just save it as character and then do the conversion of the column to date, or do it as mentioned above? – MySchizoBuddy Mar 31 '14 at 17:52
  • 1
    @MySchizoBuddy, If you only have one date column and you are doing this one time then it probably does not matter which way you do it. But if you have several columns in your dataset that are dates, then I think this approach would probably be simpler than changing each of the columns after reading. – Greg Snow Mar 31 '14 at 18:49
  • 1
    For those like me who came this thread hoping to implement it within `data.table::fread`, `fread` doesn't implement date types in it's conversions so this will still not work at time of this writing in `fread`. – jks612 Feb 04 '19 at 19:37
25

If there is only 1 date format you want to change, you could use the Defaults package to change the default format within as.Date.character

library(Defaults)
setDefaults('as.Date.character', format = '%d/%M/%Y')
dataImport <- read.csv("data.csv", colClasses = c("factor","factor","Date"))
str(dataImport)
## 'data.frame':    10 obs. of  3 variables:
##  $ func_loc     : Factor w/ 5 levels "3076WAG0003",..: 1 2 3 3 3 3 3 4 4 5
##  $ order_type   : Factor w/ 3 levels "PM01","PM02",..: 3 3 1 1 1 1 2 2 3 1
##  $ actual_finish: Date, format: "2008-10-15" "2009-10-19" NA "2008-10-11" ...

I think @Greg Snow's answer is far better, as it does not change the default behaviour of an often used function.

mnel
  • 113,303
  • 27
  • 265
  • 254
8

In case you need time also:

setClass('yyyymmdd-hhmmss')
setAs("character","yyyymmdd-hhmmss", function(from) as.POSIXct(from, format="%Y%m%d-%H%M%S"))
d <- read.table(colClasses="yyyymmdd-hhmmss", text="20150711-130153")
str(d)
## 'data.frame':    1 obs. of  1 variable:
## $ V1: POSIXct, format: "2015-07-11 13:01:53"
Mark Rajcok
  • 362,217
  • 114
  • 495
  • 492
2

A long time ago, in the meantime the problem has been solved by Hadley Wickham. So nowadays the solution is reduced to a oneliner:

library(readr)
data <- read_csv("data.csv", 
                  col_types = cols(actual_finish = col_datetime(format = "%d/%m/%Y")))

Maybe we want even to get rid of unnecessary stuff:

data <- as.data.frame(data)
Andri Signorell
  • 1,279
  • 12
  • 23