0

Background

I am using R Studio to connect R to Microsoft SQL Sever Management Studio. I am reading tables into R as follows:

library(sqldf)
library(DBI)
library(odbc)
library(data.table)

TableX <- dbGetQuery(con, statement = "SELECT * FROM [dim1].[dimA].[TableX]")

Which for some tables works fine. However for most tables which have a binary ID variable the following happens:

TableA <- dbGetQuery(con, statement = "SELECT * FROM [dim1].[dimA].[TableA]")
Error in result_fetch(res@ptr, n) : 
  nanodbc/nanodbc.cpp:xxx: xxxxx: [Microsoft][ODBC SQL Server Driver]Invalid Descriptor Index 
Warning message:
In dbClearResult(rs) : Result already cleared

I figured out that the problem is caused by the first column, which I can select like this:

TableA <- dbGetQuery(con, statement = "SELECT ID FROM [dim1].[dimA].[TableA]")

and looks as follows:

enter image description here

AlwaysLearning mentioned in the comments that this is a recurring problem (1, 2, 3). The query only works when ID is selected last:

TableA <- dbGetQuery(con, statement = "SELECT AEE, ID FROM [dim1].[dimA].[TableA]")

Updated Question

The question is essentially how I can read in the table with the ID variable last, without specifying all table variables each time (because this would be unworkable).

Possible Workaround

I thought a work around could be to select ID as an integer:

TableA <- dbGetQuery(con, statement = "SELECT CAST(ID AS int), COL2 FROM [dim1].[dimA].[TableA]")

However how do I select the whole table in this case?

I am an SQL beginner, but I thought I could solve it by using something like this (from this link):

TableA <- dbGetQuery(con, statement = "SELECT * EXCEPT(ID), SELECT CAST(ID AS int) FROM [[dim1].[dimA].[TableA]")

Where I select everything but the ID column, and then the ID column last. However the solution I suggest is not accepted syntax.

Other links

A similar problem for java can be found here.

Tom
  • 2,173
  • 1
  • 17
  • 44
  • You are not the first person to [encounter](https://github.com/r-dbi/odbc/issues/10) [this](https://github.com/r-dbi/odbc/issues/86) [issue](https://github.com/r-dbi/odbc/issues/309). Have you tried reordering the columns in your `SELECT` query to place that binary columns at the end? – AlwaysLearning Sep 27 '22 at 10:03
  • @AlwaysLearning Thank you for your comment. I went through the links, and switching the `ID` column to the last one indeed works. The problem however is that I have still not found a solution for how to automatically put the `ID` column last without specifying all columns of the table. I will however update my question with this information. Thank you. – Tom Sep 27 '22 at 13:26

1 Answers1

0

I believe I have found a workaround that meets your requirements using a table alias.

By assigning the alias T to the table I want to query, it allows me to select both a specific column ([ID]) as well as all columns in the aliased table without the need to explicitly specify them all by name.

This returns all columns of the table (including the ID column) as well as a copy of the ID column at the end of the table.

I then remove the ID column from the resulting table.

This leaves you with the desired result: all columns of a table in the order that they appear with the exception of the ID column that is placed at the end.

PS: For the sake of completeness, I have provided a template of my own DBIConnection object. You can substitute this with the specifics of your own DBIConnection object.

library(sqldf)
library(DBI)
library(odbc)
library(data.table)

con <- dbConnect(odbc::odbc(),
                  .connection_string = 'driver={YourDriver};
                 server=YourServer;
                 database=YourDatabase;
                 Trusted_Connection=yes'
                  )

dataframe <- dbGetQuery(con, statement= 'SELECT T.*, T.[ID] FROM [SCHEMA_NAME].[TABLE_NAME] AS T')

dataframe_scoped <- dataframe[,-1]
Vermundir
  • 51
  • 1
  • 3