1

I have a long csv file and I want to import some of the data (based on defining colClasses) plus its corresponding timestamp. I tried this with two different methods, the first one with my own function (based on this answer). Here some basic input to reproduce the results:

setClass("myDate")
setAs("myDate", function(from) as.Date(from, format="%d.%m.%Y %H:%M:%S") )

data <- c("15.08.2008 00:00:00,Vienna,bla,142", "23.05.2010 01:00:00,Paris,bla,92")
con <- textConnection(data)

readout <- read.csv(con, colClasses=c('myDate', 'character', 'NULL', 'numeric'), header=FALSE)
print(readout)

However, the output contains only the date, not the time (readout$V1: Date, format: "2008-08-15" "2010-05-23"):

          V1     V2  V4
1 2008-08-15 Vienna 142
2 2010-05-23  Paris  92

I tried this also with a zoo series, but I think this is not what I want, although this way it contains also the time (the data is indexed by the corresponding timestamp):

library(zoo)
csv <-
  "timestamp,city,foo,elev
   15.08.2008 00:00:00,Vienna,bla,142
   23.05.2010 01:00:00,Paris,bla,92"
readout = read.zoo(text = csv, sep = ",", header = TRUE, index = "timestamp", format = "%d.%m.%Y %H:%M:%S", tz = "CET")

print(readout)

Which yields:

                    city   foo elev
2008-08-15 00:00:00 Vienna bla 142 
2010-05-23 01:00:00 Paris  bla  92 

What I actually want is the result from my own function but also containing the time, not just the date.

Community
  • 1
  • 1
GeoEki
  • 437
  • 1
  • 7
  • 20

2 Answers2

1

I changed your code a little.

data <- c("15.08.2008 00:00:00,Vienna,bla,142", "23.05.2010 01:00:00,Paris,bla,92")
con <- textConnection(data)


datafr <-read.csv(con,header = FALSE)

class(datafr)
names(datafr)

datafr

datafr$date <- strptime(datafr$V1,format="%d.%m.%Y %H:%M:%S")
datafr

Then you can rename the columns as you wish.

Luis Candanedo
  • 907
  • 2
  • 9
  • 12
  • Thanks, this is definitely the easier solution. My text file is several hundrets of MB, so I excude columns with redundant information or data that I simply don't need with `colClasses` (e.g. `datafr <-read.csv(con,colClasses=c('character', 'character', 'NULL', 'numeric'),header = FALSE)`) based on [this reading](http://www.r-bloggers.com/using-colclasses-to-load-data-more-quickly-in-r/). Thanks! – GeoEki Jan 25 '16 at 15:58
  • @GeoEki You should have a look at the `fread` function from package data.table. It is much faster than `read.csv` for larger data. – Roland Jan 26 '16 at 10:54
0

The askers approach is quite valid and only needs a minor tweak. Added the from class and used as.POSIXct instead of as.Date. The Date class is represented as the number of days since 1970-01-01 which means: no time stamp.

setClass("myDate")
setAs("character", "myDate", function(from) as.POSIXct(from, format="%d.%m.%Y %H:%M:%S") )

data <- c("15.08.2008 00:00:00,Vienna,bla,142", "23.05.2010 01:00:00,Paris,bla,92")
con <- textConnection(data)
readout <- read.csv(con, colClasses=c('myDate', 'character', 'NULL', 'numeric'), header=FALSE)
print(readout)
#>                    V1     V2  V4
#> 1 2008-08-15 00:00:00 Vienna 142
#> 2 2010-05-23 01:00:00  Paris  92

Created on 2021-02-19 by the reprex package (v1.0.0)

Jan
  • 4,974
  • 3
  • 26
  • 43