15

So this is VERY strange. RODBC seems to drop the time portion of DateTime SQL columns if the result set is large enough. (The queries are running against an SQL Server 2012 machine, and, yes, when I run them on the SQL Server side they produce identical and proper results, regardless of how many rows are returned.)

For example, the following works perfectly:

myconn <- odbcConnect(dsnName, uid, pwd)
results <- sqlQuery(myconn, "SELECT TOP 100 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)

In R, the following works as expected:

> results$MyDateTimeColumn[3]
[1] "2013-07-01 00:01:22 PDT"

which is a valid POSIXct date time. However, when somewhere between 10,000 and 100,000 rows are returned, suddenly the time portion disappears:

myconn <- odbcConnect(dsnName, uid, pwd)
bigResults <- sqlQuery(myconn, "SELECT TOP 100000 MyID, MyDateTimeColumn from MyTable ORDER BY MyDateTimeColumn DESC")
close(myconn)

(same code, simply a larger number of rows returned; NOTE: the exact same row has now lost its time component), R responds:

> bigResults$MyDateTimeColumn[3]
[1] "2013-07-01 PDT"

Note that the time is now missing (this is not a different row; it's the exact same row as previous), as the following shows:

>strptime(results$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] "2013-07-01 00:01:22"

>strptime(bigResults$TriggerTime[3], "%Y-%m-%d %H:%M:%S")
[1] NA

Obviously the work-around is either incremental query-with-append or export-to-CSV-and-import-to-R, but this seems very odd. Anyone ever seen anything like this?

Config: I'm using the latest version of RODBC (1.3-10) and can duplicate the behavior on both an R installation running on Windows x64 and an R installation running on Mac OS X 10.9 (Mavericks).

EDIT #2 Adding output of dput() to compare the objects, per request:

> dput(results[1:10,]$MyDateTimeColumn)
structure(c(1396909903.347, 1396909894.587, 1396909430.903, 1396907996.9, 1396907590.02, 1396906077.887, 1396906071.99, 1396905537.36, 1396905531.413, 1396905231.787), class = c("POSIXct", "POSIXt"), tzone = "")

> dput(bigResults[1:10,]$MyDateTimeColumn)
structure(c(1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000, 1396854000), class = c("POSIXct", "POSIXt"), tzone = "")

It would appear that the underlying data are actually changing as a result of the number of rows returned by the query, which is downright strange.

jbeldock
  • 2,755
  • 3
  • 18
  • 31
  • Maybe try [RJDBC](http://cran.r-project.org/web/packages/RJDBC/index.html)? – Joshua Ulrich Apr 08 '14 at 09:22
  • In case it's unclear to other people, the "no time" timestamps are start-of-day for a +0700 GMT timezone. So, is the db in Cambodia, Indonesia, Thailand, Vietnam (or possibly some others)? How does the backend db actually store this information (integer vs float vs string); is time a separate piece from the date information internally? Because otherwise the db has to be doing _more_ work to do this, which is **stranger**. Or maybe just the returned type is different? – Clockwork-Muse Apr 09 '14 at 13:08
  • any answer to this please? same issue here. field is stored as datetime in SQL and RODBC stores it as posixct but removes time stamp. @JoshuaUlrich i cant use RJDBC for my case as it stores field as character and it makes sorting by this field a lot slower. i have lots of data too and converting takes too long too. any ideas please? thank you! – charliealpha Aug 24 '17 at 07:19

8 Answers8

6

sqlQuery() has an option called as.is. Setting this to TRUE will pull everything as seen in for example Microsoft SQL Management Studio.

kristang
  • 557
  • 5
  • 17
  • 1
    No, returning a year's worth of hours (8760) will strip the time component off of my datetime field (not all datetime fields, just the one I'm most interested in [of course]). If I return TOP(3) rows, then I've got my time component regardless of how I set the "as.is" in my sqlQuery() call. – Mark Bergan Nov 04 '17 at 20:25
3

I cope with the same problem as well. Even stranger, on a large dataset one column would import both date and time the other column only imported the date.

My advise would be to split the data/time in SQL

 myconn <- odbcConnect(dsnName, uid, pwd)
 results <- sqlQuery(myconn, "SELECT TOP 100 MyID, format(MyDateTimeColumn,"HH:mm:ss") as MyTimeColumn,format(MyDateTimeColumn,"yyyy-MM-dd") as MyDateColumn  from MyTable ORDER BY MyDateTimeColumn DESC")
 close(myconn)

Then combine them in R afterwards. Hope it helps.

JR Helmus
  • 311
  • 1
  • 12
3

I had the same issue and concluded that it is due to DST:

This fails: as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"))

This works: as.POSIXct(c("2015-03-29 01:59:22", "2015-03-29 02:00:04"), tz="UTC")

I could not find how to force tz="UTC" in default RODBC behavior, however using as.is = TRUE and converting columns myself does the job.

Note: At first I had the impression that it was due to huge result, but in fact it was due to the fact that in huge results I have more chances to cross DST updates.

Battmanux
  • 61
  • 2
3

This is an older question, but I had similar issues when trying to programmatically read in data from 15 different .accdb. All POSIXct fields were read in correctly for every database except those from the months of March, from which I inferred that it is some sort of daylight-savings time issue.

The solution for me (because I didn't want to have to make multiple queries to a dB and then rbind() everything together) was to alter my function to include the lines

#Get initial tz
current_tz <- Sys.timezone()
Sys.setenv(TZ = "America/Phoenix")

[BODY OF FUNCTION]

Sys.setenv(TZ = curent_tz)

After including these few lines, the day/time fields from the March databases were being read in correctly.

Steven
  • 3,238
  • 21
  • 50
3
sqlQuery(ch, getSQL(sqlquerypath)) 

stripped the times off my datetime column.

sqlQuery(ch, getSQL(sqlquerypath), as.is = TRUE) 

fixed the issue.

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
yasfried
  • 61
  • 2
1

I think this is a case of times being stripped from the dates where the date range includes shifts to/from daylight savings time. If you are selecting periods that don't include daylight savings shifts, the times will be retained (e.g., from 1/1/2007 to 3/1/2007. This could possibly be avoided by changing the system time on your computer to follow a time zone (e.g., Arizona) where there are no daylight savings shifts (sounds bizarre, but it has worked for me).

To overcome this issue, import the DateTimes as characters (using "as.is") and then convert them to POSIXct. You could alternatively use "strptime" which converts to POSIXlt and allows you to specify the format. Here's an example of a SQL query where DateTimes are imported as.is (TRUE), but asociated DataValues are not (FALSE) and then the DateTime is converted to an R date format:

data <- sqlQuery(channel, paste("SELECT LocalDateTime, DataValue FROM DataValues WHERE  LocalDateTime >= '1/1/2007 0:00' AND LocalDateTime < '1/1/2008 0:00' ORDER BY LocalDateTime ASC"),as.is=c(TRUE,FALSE))

data$LocalDateTime <- as.POSIXct(totalP$LocalDateTime,tz="MST")
amberwaves
  • 35
  • 7
0

It may be a daylight saving issue. If there is a time that doesnt exist in your timezone (because of daylight saving) it may cause something like this.

  • umm, no. When DST strikes, it doesn't _remove_ the time, it changes the reported (local time) value. Most systems store time as an incrementing counter - DST doesn't change this value, it simply reports things as an hour (or whatever) different. The differences in values are **far** more than a sane hour (the first one is 15 hours, plus some minutes and seconds). So I'd expect to see _some_ time portion, even if it was `00:00:00`. – Clockwork-Muse Apr 09 '14 at 13:13
0

Why this happens on large datasets returned from sqlQuery()? I don't know. But was able to workaround it by applying a sql conversion in the sql call:

data <- sqlQuery(channel, "SELECT CONVERT(nvarchar(24), DtTimeField, 21) AS HourDt, * FROM ...

This is your workaround.