1

When I use the following read.zoo it goes great until I added the last line: (my source is CSV, but here it is in a format for reproducing):

library(zoo)
 Lines <- "fdatetime,Consumption
    1,27/03/2015 01:00,0.04
    2,27/03/2015 02:00,0.04"


> z <- read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",")
Error in read.zoo(text = Lines, tz = "", format = "%d/%m/%Y %H:%M", sep = ",") : 
  index has bad entry at data row 51

What's wrong with the last line? If you delete the last line it will work!

> data.table::fread(file.choose(), verbose = TRUE)
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000001 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 3 columns. Longest stretch was from line 2 to line 30
Starting data input on line 2 (either column names or first row of data). First 10 characters: 1,25/03/20
Some fields on line 2 are not type character (or are empty). Treating as a data row and using default column names.
Count of eol: 51 (including 0 at the end)
Count of sep: 102
nrow = MIN( nsep [102] / ncol [3] -1, neol [51] - nblank [0] ) = 51
Type codes (   first 5 rows): 143
Type codes (+ middle 5 rows): 143
Type codes (+   last 5 rows): 143
Type codes: 143 (after applying colClasses and integer64)
Type codes: 143 (after applying drop or select (if supplied)
Allocating 3 column slots (3 - 0 dropped)
Read 51 rows. Exactly what was estimated and allocated up front
   0.000s (  0%) Memory map (rerun may be quicker)
   0.000s (  0%) sep and header detection
   0.000s (  0%) Count rows (wc -l)
   0.001s (100%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 51x3 result (xMB) in RAM
   0.000s (  0%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.001s        Total
Avi
  • 2,247
  • 4
  • 30
  • 52
  • Have you checked if your last line has an \r\n (Carriage Return) at the end? maybe there are one or two empty lines at the end of the file. – BerndGit Feb 21 '16 at 14:45
  • The above does not produce an error for me (on Linux). Most likely the issue has to do with carriage return in the original file as pointed out by @BerndGit. – nrussell Feb 21 '16 at 14:48
  • As can be seen there is no carriage return. I tried it when using even 100 and more lines and the problem is in this specific line. Other files with same format works great. – Avi Feb 21 '16 at 14:55
  • @G. Grothendieck, In this case there is no need for header=TRUE – Avi Feb 21 '16 at 14:56
  • Are you able to read the file in with other functions? For example, does `data.table::fread("/path/to/actual/file", verbose = TRUE)` mention anything unusual? – nrussell Feb 21 '16 at 14:59
  • I use ts1<-read.csv (file.choose()) and when I see the content in R and in Notepad++ it looks good with no added spaces or characters and same as other files that were loaded OK. This files works great till I added this specific line. – Avi Feb 21 '16 at 15:01
  • Try using `data.table::fread` with `verbose = TRUE` specifically. Most likely you aren't going to actually *see* characters like `\r\n` by inspecting the file manually. – nrussell Feb 21 '16 at 15:05
  • Please find at the end of question body results for data.table::fread("/path/to/actual/file", verbose = TRUE) – Avi Feb 21 '16 at 15:10
  • @Avi, Good point about header=TRUE not being needed. I just tried it and in fact your `read.zoo` code worked for me so I can't reproduce the problem. – G. Grothendieck Feb 21 '16 at 15:24
  • If you copy the data and code from this question and paste it into your R session does it work in that case? – G. Grothendieck Feb 21 '16 at 15:29
  • No it doesn't work neither by using it as is nor by using it from CSV file. Any suggestion? – Avi Feb 21 '16 at 16:26
  • When I delete line 51 - like a magic, everything is OK. When line 51 is even a larger file - error, amazing!!!!! – Avi Feb 21 '16 at 17:05
  • Could it be related to daylight saving? I see that at least [some countries switched from Standard time to DST `27/03/2015 02:00`](http://www.timeanddate.com/time/dst/2015a.html). DST may cause some surprises (see [some related Q&A](http://stackoverflow.com/search?tab=votes&q=%5br%5d%20daylight%20saving)). – Henrik Feb 21 '16 at 18:32
  • 1
    See [this Q&A](http://stackoverflow.com/questions/27361500/trouble-finding-non-unique-index-entries-in-zooreg-time-series). Same error as you: "_index has bad entries at data rows_". From the answer: "The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone" – Henrik Feb 21 '16 at 18:42

1 Answers1

2

Thanks to @Henrik, The solution for it is to specify tz, as followed:

z<-read.zoo(ts1, tz = "UTC", format = "%d/%m/%Y %H:%M", sep = ",")
Avi
  • 2,247
  • 4
  • 30
  • 52
  • 4
    Since the problem is the switch to daylight savings time you could alternately use chron instead of POSIXct as it has no time zones or daylight savings time: `library(chron); z <- read.zoo(text = Lines, FUN = as.chron, format = "%d/%m/%Y %H:%M", sep = ",")` – G. Grothendieck Feb 21 '16 at 23:06