2

R version: 3.1.0

The following threads have been read: - format a Date column in a Data Frame - Convert data frame with date column to timeseries

And several other information about this topic without any luck.

I simply need to import a data frame from a .csv file, and I need the index to be a date column as specified by the Date column.

df <- read.csv(sti, header=TRUE)
df$Date <- as.Date(df$Date, format="%Y%m%d")

(I've tried several other dateformats)

This should be fairly simple, but the result is either a column filled with N/As, a column with wrong dates og an error message.

My csv file contains Excel date formats, and I've tried changing the language settings in Windows.

I know it's something with the date format, but I do not know what. Even a dput() of the head seems way off. Here's a shorted version:

structure(list(Date = c("30/12/2013", "27/12/2013", "23/12/2013", 
"20/12/2013", "19/12/2013", "18/12/2013"), MAERSKA = c(11180, 
11150, 10900, 10770, 10670, 10500), WDH = c(527, 522, 515.5, 
515.5, 512, 504.5), VWS = c(160.2, 159.8, 157, 156.6, 156.5, 
153), TRYG = c(524.5, 523, 520.5, 519, 504, 484), TOP = c(142.8, 
142.5, 141.9, 141.5, 139.9, 136.5), TDC = c(52.6, 52.7, 52.45, 
51.95, 51.65, 51.85), Pandora = c(294, 301.2, 304, 300.3, 296.1, 
293.1), NOVOB = c(198.8, 197.2, 195.1, 196.3, 195.6, 190.7), 
    Nordea = c(72.05, 71.35, 70.6, 70.1, 68.65, 67.8), COLOB = c(359, 
    358.3, 353.4, 353.2, 350.2, 342), CHR = c(215.4, 215.7, 212.3, 
    209.1, 206.3, 204.2), CARLB = c(600, 596, 586.5, 586, 584, 
    573), MAERSKB = c(11770, 11740, 11510, 11310, 11210, 11070
    ), JYSKE = c(292.5, 288, 284.4, 282.8, 276.9, 275.7), GN = c(133.2, 
    132.3, 130.5, 129, 127.8, 126.2), GENMAB = c(212, 214.9, 
    217.4, 222.5, 221.6, 216.7), FLS = c(296.1, 290.3, 280.3, 
    278.1, 273.6, 267.1), DSV = c(177.8, 178.2, 176.8, 174, 171.2, 
    169.3), DANSKE = c(124.4, 124.3, 124.3, 123.7, 121, 120.3
    ), NOVOZYMESB = c(228.9, 229.9, 228.5, 230.4, 219.9, 215.6
    )), .Names = c("Date", "MAERSKA", "WDH", "VWS", "TRYG", "TOP", 
"TDC", "Pandora", "NOVOB", "Nordea", "COLOB", "CHR", "CARLB", 
"MAERSKB", "JYSKE", "GN", "GENMAB", "FLS", "DSV", "DANSKE", "NOVOZYMESB"
), row.names = c(NA, 6L), class = "data.frame")

Additional information:

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 

I hope someone has an idea for what might cause the problem. Later on I need to convert the dataframe to a zoo object, since the package PerformanceAnalytics needs this as input.

Community
  • 1
  • 1
marcopah
  • 826
  • 2
  • 7
  • 25
  • 3
    I may not understand the question - is the problem just that you need to do `as.Date(df$Date, format="%d/%m/%Y" )` to accommodate your format? – James King May 18 '14 at 12:33
  • 2
    Please remove the "....." that you manually added to the `dput` output, otherwise it cannot be easily copied and pasted. – talat May 18 '14 at 12:38
  • It will exceed the character limitation if i paste the full dput() output. @user3114046 I need to convert the Date-column to a date column for R so I later on can convert my data frame to a zoo object. Read.zoo() is not able to read the csv-file, so I want to read it as a data frame and then convert it afterwards. – marcopah May 18 '14 at 12:54
  • Use `dput(head(df))`. That will include only the first 6 rows of df. – talat May 18 '14 at 12:57
  • He did, but he read the dates as a factor. @Marco, try reading in your data with `stringsAsFactors = FALSE`. – James King May 18 '14 at 12:58
  • Please see edit for new dput. – marcopah May 18 '14 at 13:01
  • @Marco, please try this: `df <- read.csv(sti, header=TRUE, stringsAsFactors = FALSE); df$Date <- as.Date(df$Date, format="%d%m%/Y")`. Does that do what you want? – James King May 18 '14 at 13:02
  • @user3114046 i think you forgot the `/` in the format, so @Marco try `df$Date <- as.Date(df$Date, format="%d/%m/%Y")` – talat May 18 '14 at 13:05
  • @user3114046 unfortunately not. my date column contains N/A when using that format and R doesn't provide any warnings or error messages. – marcopah May 18 '14 at 13:05
  • See beginneR's comment about my missing /. – James King May 18 '14 at 13:06
  • Thanks guys - I got i to work. I did actually try the df$Date <- as.Date(df$Date, format="%d/%m/%Y") before posting. I might have made a mistake before, but I think the stringsAsFactors = FALSE did the trick. – marcopah May 19 '14 at 11:59

1 Answers1

17

Using the data you posted above, the Date column is a character. And since your dates look like "30/12/2013" format should be "%d/%m/%Y" not "%Y%m%d". It is very important to get the format correct to make the conversion. Therefore you should be able to do

df$Date <- as.Date(df$Date, format="%d/%m/%Y")

With the sample data,

> class(df$Date)=="Date"
[1] TRUE
> sum(is.na(df$Date))==0
[1] TRUE

so everything looks good.

MrFlick
  • 195,160
  • 17
  • 277
  • 295