1

So there's many problems on StackOverflow with the RODBC problem, but I haven't seen this specific variant of attempting to append to an AZURE SQL db. I really think there should be a parameter that allows you to identify the KEY and NOT try to load into that... I'd prepare a pull request, but RODBC doesn't have a dev branch on github? Anyway, I'll post my problem and what I tried to do, and then my nasty workaround.

I have my data in a table called ActDF.new Here are the properties:

str(ActDF.new)
'data.frame':   52 obs. of  10 variables:
 $ Date          : Date, format: "2016-03-23" "2016-03-23" "2016-03-23" "2016-03-23" ...
 $ Project       : Factor w/ 1 level "x": 1 1 1 1 1 1 1 1 1 1 ...
 $ IndName       : Factor w/ 26 levels "x x...etc",..: 2 17 1 4 11 12 8 3 25 6 ...
 $ IndNum        : num  1 2 3 4 5 6 7 8 9 10 ...
 $ ProjectYear   : Factor w/ 2 levels "bla","blabla": 1 1 1 1 1 1 1 1 1 1 ...
 $ Value         : num  NA NA NA NA 4883 ...
 $ NoteTitle     : Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
 $ NoteAnnotation: Factor w/ 1 level "": 1 1 1 1 1 1 1 1 1 1 ...
 $ ID            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ CorpCode      : ch

I want to APPEND this info into a db, with a newish table called Actuals. So I tried using RODBC::sqlSave to accomplish this. Below find the blow-by-blow:

Connect to the db

d <- "Actuals RW"
p <- "xx"
u <- "xx"
channel <- odbcConnect(d,u,p)

Get number of rows on the db (to know where the KEY should start)

PresentNum <- sqlQuery(channel, 'SELECT count(*) FROM Actuals', rows_at_time = 5)
PresentNum <- PresentNum[1,1]

OK, so let's add an ID to my DF, but also, there's tons of NAs for Value... and I don't need those, so let's convert to a better DF

## Initialize ID on this df
ActDF.new$ID <- 1

## Remove NAs from ActDF.new, and organize
toSave <- ActDF.new %>% filter(!is.na(Value)) %>%
  select (ID,Date,Project,FiscalYear=ProjectYear,IndNum,IndName,CorpCode,CurrentValue=Value,NoteTitle,NoteAnnotation)

## And now issue correct numbers to the ID
toSave$ID <- (PresentNum+1):(nrow(toSave)+PresentNum)

There are many blank values, so let's convert those to NA (this is an annoying way to do it... I know)

toSave <- 
  apply(toSave, 2, function(x) gsub("^$|^ $", NA, x))  %>% as.data.frame()

## Now everything is a factor, convert to correct format
toSave$ID <- as.numeric(toSave$ID)
toSave$Date<- as.Date(toSave$Date)
toSave$Project<- as.character(toSave$Project)
toSave$FiscalYear<- as.character(toSave$FiscalYear)
toSave$IndNum<- as.character(toSave$IndNum)
toSave$IndName<- as.character(toSave$IndName)
toSave$CorpCode<- as.character(toSave$CorpCode)
toSave$CurrentValue<- as.numeric(toSave$CurrentValue)
toSave$NoteTitle<- as.character(toSave$NoteTitle)
toSave$NoteAnnotation <- as.character(toSave$NoteAnnotation)

### OK, we're ready! So try to append! ###

################# Test 1 TRY APPENDING DATA AS IS
sqlSave(channel, toSave, tablename = 'Actuals', append = T,
        rownames = F, colnames = F, verbose = T,
        safer = T, addPK = F, 
        fast = T, test = F)
### RETURNS ERROR: Cannot insert explicit value for identity column in table 'Actuals' when IDENTITY_INSERT is set to OFF.

Huh... fair enough...

################# Test 2 TRY TO CHANGE THE IDENTITY_INSERT PROPERTY  

sqlQuery(channel, "Set IDENTITY_INSERT Actuals ON", errors = TRUE)
### RETURNS ERROR: Cannot find the object \"Actuals\" because it does not exist or you do not have permissions."

Oh... but... wait, what? The table for sure exists and I have RW rights. So maybe IDENTITY_INSERT is different somehow... what's the status anyway?

sqlQuery(channel, "SELECT OBJECTPROPERTY(OBJECT_ID('Actuals'), 'TableHasIdentity')")
### RETURNS 1. 

Huh. Don't know what that means... let's try again w/ different addPK settings

#################  Test 3, try to use addPK = TRUE to see if it makes difference.
sqlSave(channel, toSave, tablename = 'Actuals', append = T,
        rownames = F, colnames = F, verbose = T,
        safer = T, addPK = T, 
        fast = T, test = F)
### RETURNS ERROR: Cannot find the object \"Actuals\" because it does not exist or you do not have permissions."

That's not annoying at all. OK FINE, let's remove the ID all-together

################# Test 4, Try to remove the ID
NoID <- toSave[,-grep("ID",names(toSave))]

sqlSave(channel, NoID, tablename = 'Actuals', append = T,
        rownames = FALSE, colnames = FALSE, verbose = T,
        safer = TRUE, addPK = F, 
        fast = T, test = F)
### RETURNS ERROR: Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  : missing columns in 'data'

Oh really? There's missing columns???????? FINE

## So add back in a dummy column
NoID$dummy <- 0
sqlSave(channel, NoID, tablename = 'Actuals', append = T,
        rownames = FALSE, colnames = FALSE, verbose = T,
        safer = TRUE, addPK = F, 
        fast = T, test = F)
### RETURNS ERROR: Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  : missing columns in 'data'

Setting fast=F returns the error: length of 'dimnames' [2] not equal to array extent

OK. gg sqlSave you win, I lose. Here is where I think a modification is in order... checking the SQL query that it's building, we see:

Query: INSERT INTO "Actuals" ( "ID", "Date", "Project", "FiscalYear", "IndNum", "CorpCode", "CurrentValue", "NoteTitle", "NoteAnnotation", "IndName" ) VALUES ( ?,?,?,?,?,?,?,?,? )

I think there needs to be some parameter where I could simply specify that query to NOT try to append to the column ID... right?

Am I missing something?

Amit Kohli
  • 2,860
  • 2
  • 24
  • 44

1 Answers1

1

This was my work around:

################# Test 5, Try issuing the append command manually:

Q <- "INSERT INTO \"Actuals\" ( \"Project\", \"FiscalYear\") VALUES ('test','hello');"
sqlQuery(channel, Q, errors = TRUE)

Ok, so that works! (so I know it wasn't a permission issue). Exactly this format though: It needs DOUBLE quotes for the table/field names, and SINGLE quotes for the data. OK, so now let's try to apply this logic to our real data:

## first drop the dummy again:
NoID <- NoID[,-grep("dummy",names(NoID))]

## Ensure field names are surrounded by a DOUBLE quote, values are surrounded by a SINGLE quote. 
## Separate out the Date field because `paste` converts it to character if it's in with the rest of the data.

Q <- paste(
  "INSERT INTO \"Actuals\"  ( \"Date\", \"Project\", \"FiscalYear\", \"IndNum\", \"IndName\", \"CorpCode\", \"CurrentValue\", \"NoteTitle\", \"NoteAnnotation\" )",
  " VALUES ( '", NoID[1,1], "','", paste(NoID[1,2:ncol(NoID)],collapse="','"),
  "')", sep="")

sqlQuery(channel, Q, errors = TRUE)

Finally!! ok, this works. Now do for all DF, but combining >2 character vectors pairwise is tricky... so:

## first create a character vector for each row, with the quotation marks nicely blended.
crazyD <- ""
for(i in 1:ncol(NoID)){
  crazyD <- paste(crazyD,paste("'",NoID[,i],"'", sep=""),sep="")
} 
crazyD <- gsub("''","','",crazyD)

## And now combine that one with the titles
Q <- paste(
  "INSERT INTO \"Actuals\"  ( \"Date\", \"Project\", \"FiscalYear\", \"IndNum\", \"IndName\", \"CorpCode\", \"CurrentValue\", \"NoteTitle\", \"NoteAnnotation\" ) VALUES ( ",
  crazyD, ")", collapse="; ")

## And push that query into the server
sqlQuery(channel, Q, errors = TRUE)

And that's how I'll do it. I guess, until someone tells me how to do it better. Until then, what will keep me up is: Maximum size for a SQL Server Query? IN clause? Is there a Better Approach

Community
  • 1
  • 1
Amit Kohli
  • 2,860
  • 2
  • 24
  • 44
  • Do you really need to escape all the double quotes, e.g. `\"`? I sympathize, but am unsurprised and can tell you from experience that suggesting improvements/changes to RODBC is essentially futile. – joran Mar 24 '16 at 15:18
  • As far as I know, you can either "do some 'specific' stuff" or viceversa, but since I need BOTH double and single quotes, I don't know how else to do it rather than escaping and escaping... how come RODBC is unresponsive? I don't think my request "Please let me optionally not write to my bound field" is ridiculous, is it? – Amit Kohli Mar 24 '16 at 15:41
  • I'm just saying that in my personal experience, the maintainer for that package has had almost no interest in entertaining changes that increase or improve functionality. I don't know why. Perhaps other people may have better luck than I have. – joran Mar 24 '16 at 15:44
  • I'll try emailing him. Let's see. – Amit Kohli Mar 24 '16 at 15:54
  • @joran it seems he retired from his college due to health. Last update was June 2015, but he's still listed as the maintainer, so perhaps that explains why he's not super reactive? Let's give him some time. – Amit Kohli Mar 24 '16 at 16:29
  • Are you not aware of who he is? BDR is probably one of the most prolific and key contributors to R. I was aware that he'd retired his academic position due to health reasons, but he has regularly been [working](https://github.com/wch/r-source/commits/trunk) on R Core development. His lack of response to me dates back almost two years and he ignored me for a period of nearly a year during which he was actively and regularly contributing to the development of R itself. But, as I said, maybe you'll have more luck. – joran Mar 24 '16 at 16:32