You are using read.zoo(...)
incorrectly. According to the documentation:
To process the index, read.zoo calls FUN with the index as the first
argument. If FUN is not specified then if there are multiple index
columns they are pasted together with a space between each. Using the
index column or pasted index column: 1. If tz is specified then the
index column is converted to POSIXct. 2. If format is specified then
the index column is converted to Date. 3. Otherwise, a heuristic
attempts to decide among "numeric", "Date" and "POSIXct". If format
and/or tz is specified then they are passed to the conversion function
as well.
You are specifying format=...
so read.zoo(...)
converts everything to Date, not POSIXct. Obviously, there are many, many duplicated dates.
Simplistically, the correct solution is to use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M")
# Error in read.zoo(df, FUN = as.POSIXct, format = "%Y/%m/%d %H:%M") :
# index has bad entries at data rows: 507 9243 18147 26883 35619 44355
but as you can see this does not work either. Here the problem is much more subtle. The index is converted using POSIXct
, but in the system time zone (which on my system is US Eastern). The referenced rows have timestamps that coincide with the changeover from Standard to DST, so these times do not exist in the US Eastern timezone. If you use:
df <- read.zoo(df, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC")
the data imports correctly.
EDIT:
As @G.Grothendieck points out, this would also work, and is simpler:
df <- read.zoo(df, tz="UTC")
You should set tz
to whatever timezome is appropriate for the dataset.