2

I am attempting to format a column of dates and time (datetime) from my file "f845" using as.POSIXct in R. The file 845 has 21 lines to skip before formatting as seen in the first line of code below, it also has two columns and 100000 rows.

When I attempt to format the datetime column with the second line of code below I end up with the output as seen in the data below the 3rd line.

How come my code is not formatting the date and time properly?

> dat=read.table(f845,sep="\t",skip=21,col.names=c("datetime","light"))
> dat$datetime=as.POSIXct(strptime(dat$datetime,format="%d-%m-%Y %H:%M:%S", tz="UTC"))
> dat[1:10,]
   datetime  time
1      <NA> 4.542
2      <NA> 7.949
3      <NA> 5.678
4      <NA> 7.949
5      <NA> 7.949
6      <NA> 6.813
7      <NA> 2.271
8      <NA> 2.271
9      <NA> 2.271
10     <NA> 2.271

Here is what the data looks like before the formatting:

> dat=read.table(f845,sep="\t",skip=21,col.names=c("datetime","light"))
> dat[1:10,]
              datetime light
1  21/05/2013 22:56:07 4.542
2  21/05/2013 23:01:07 7.949
3  21/05/2013 23:06:07 5.678
4  21/05/2013 23:11:07 7.949
5  21/05/2013 23:16:07 7.949
6  21/05/2013 23:21:07 6.813
7  21/05/2013 23:26:07 2.271
8  21/05/2013 23:31:07 2.271
9  21/05/2013 23:36:07 2.271
10 21/05/2013 23:41:07 2.271

Edit: the output of dput(droplevels (dput(dat[1:10,])) is shown below, there is still a large number of dates so I only show the last few lines:

> dput(droplevels(dat[1:10,]))
structure(list(datetime = structure(1:10, .Label = c("21/05/2013 22:56:07", 
"21/05/2013 23:01:07", "21/05/2013 23:06:07", "21/05/2013 23:11:07", 
"21/05/2013 23:16:07", "21/05/2013 23:21:07", "21/05/2013 23:26:07", 
"21/05/2013 23:31:07", "21/05/2013 23:36:07", "21/05/2013 23:41:07"
), class = "factor"), light = c(4.542, 7.949, 5.678, 7.949, 7.949, 
6.813, 2.271, 2.271, 2.271, 2.271)), .Names = c("datetime", "light"
), row.names = c(NA, 10L), class = "data.frame")
JC11
  • 473
  • 1
  • 8
  • 20
  • 3
    Try `as.POSIXct(dat$datetime, format="%d/%m/%Y %H:%M:%S")` You used `-` instead of `/` – akrun Oct 17 '14 at 15:21
  • I am still receiving "NA" in all the rows. – JC11 Oct 17 '14 at 15:25
  • 1
    Could you post the output of `dput(dat[1:10,])` – akrun Oct 17 '14 at 15:27
  • It does work on the first 10 lines, I neglected to mention that the file has 100000 rows. I have added that info to the question. – JC11 Oct 17 '14 at 15:28
  • It may be better to check whether you have multiple formats in the file. ie. some rows with `21/05/2013...` some with `21-05-2013..`. Check if `sum(!grepl("[/]", dat$datetime))` >0 – akrun Oct 17 '14 at 15:30
  • I think you pasted only a part of `dput(dat[1:10,])` output (which is not useful) – akrun Oct 17 '14 at 15:33
  • I receive a "FALSE" with `sum(!grepl("[/]", dat$datetime)) >0` – JC11 Oct 17 '14 at 15:34
  • 1
    Not posting the full results of `dput()` completely defeats the purpose. It seems you are reading your date time values in as factors which is probably not what you want. But try `droplevels(dput(dat[1:10,])` to get rid of extra levels. – MrFlick Oct 17 '14 at 15:34
  • In this simple test, everything seems to work: `ok<-read.table(text=c("21/05/2013 22:56:07\t4.542","21/05/2013 23:01:07\t7.949 "), sep="\t", col.names=c("datetime","light")); as.POSIXct(ok$datetime, format="%d/%m/%Y %H:%M:%S")`. How is your data different? – MrFlick Oct 17 '14 at 15:35
  • I entered in `ok<-read.table(f845, sep="\t", skip=21,col.names=c("datetime","light")); as.POSIXct(ok$datetime, format="%d/%m/%Y %H:%M:%S")` and it is formatting fine now. I will add drop levels to the question. – JC11 Oct 17 '14 at 15:42
  • @JC11 I can have a look at the file if you can share it in dropbox. – akrun Oct 17 '14 at 15:50
  • 1
    And sorry, i meant `dput(droplevels(dat[1:10,]))`. That should make the output much shorter so you can post the whole thing. Posting a partial `dput` is never helpful. See [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Oct 17 '14 at 15:55
  • @akrun https://www.dropbox.com/sh/uoi9hxkykhn3zct/AAB1QYnYNjXgj_vp3ICm7BLya?dl=0 – JC11 Oct 17 '14 at 16:31
  • Thank you akrun and Mr Flick, it is working fine. Although I suspected this would solve another issue I was having, but it is not. http://stackoverflow.com/questions/26370985/error-reading-dataset-regarding-differing-number-of-rows-in-r – JC11 Oct 17 '14 at 16:44

1 Answers1

1

The initial error might be due to using - instead of / in the format.

 dat <- read.table("D845.lux",sep="\t",skip=21,
        col.names=c("datetime","light"), stringsAsFactors=FALSE)

  dim(dat)
 #[1] 100247      2

  Datetime <- as.POSIXct(dat$datetime, format="%d/%m/%Y %H:%M:%S")
  head(Datetime)
  #[1] "2013-05-21 22:56:07 EDT" "2013-05-21 23:01:07 EDT"
  #[3] "2013-05-21 23:06:07 EDT" "2013-05-21 23:11:07 EDT"
  #[5] "2013-05-21 23:16:07 EDT" "2013-05-21 23:21:07 EDT"

  any(is.na(Datetime))
 #[1] FALSE
akrun
  • 874,273
  • 37
  • 540
  • 662