0

I have a file (roughly 1000000 rows and 15 columns) in my local environment in R, and I am trying to upload this file onto a SQL server:

library(odbc)
library(DBI)
library(RODBC)

dbWriteTable(my_connection,  SQL("my_table"), my_table)

But I am not able to upload this entire table on to the server, when I run this code I get the error:

Error in result_insert_dataframe(rs@prt, values):  nanodbc/nanodbc.cpp:1587 : HY008 : Operation canceled

However, I noticed that I am able to successfully upload "parts" of this table onto the server:

#this works fine

my_table_1 = my_table[1:1000,]

dbWriteTable(my_connection,  SQL("my_table_1"), my_table_1)

This is leading me to believe that somewhere in "my_table" there are some rows which are not being accepted by the server - but I am not sure how to correctly identify these rows.

  • Is there some way to either find out which rows are causing the problems, or perhaps is there a way to "trick"/"force" the server to accept these rows?

Here is a summary ("str" command) of the file in R (global environment) :

- var1 : chr
- var2: chr
- var3: chr
- var4: chr
- var 5: chr
- var6: chr
- var7: chr
- var8: chr
- var9: chr
- var10: chr
- var11:chr
- var12:num
- var13:num
- var14:num
- var15: chr

And here is a summary of the variable types for the first 100 rows of this file which were successfully uploaded on to the SQL:

- var1 : CHARACTER VARRYING (255)
    - var2: CHARACTER VARRYING (255)
    - var3: CHARACTER VARRYING (255)
    - var4: CHARACTER VARRYING (255)
    - var 5: CHARACTER VARRYING (255)
    - var6: CHARACTER VARRYING (255)
    - var7: CHARACTER VARRYING (255)
    - var8: CHARACTER VARRYING (255)
    - var9: CHARACTER VARRYING (255)
    - var10: CHARACTER VARRYING (255)
    - var11:CHARACTER VARRYING (255)
    - var12:DOUBLE PERCISION 
    - var13:DOUBLE PERCISION 
    - var14:DOUBLE PERCISION 
    - var15: CHARACTER VARRYING (255)

Based on these summaries:

  • Is there some way to either find out which rows are causing the problems, or perhaps is there a way to "trick"/"force" the server to accept these rows?

Thank you!

stats_noob
  • 5,401
  • 4
  • 27
  • 83

2 Answers2

1

It might be that your dataset is too big to write it to the database with a single operation. I remember having similar issues a while ago and it would make sense since 1000 rows at a time seem to work for you. Splitting the dataset in chunks and writing them one by one might work.

This is untested, but maybe this works:

my_table_chunks <- 
  split(my_table, ceiling(seq_along(1:nrow(my_table))/1000))
  
lapply(my_table_chunks,
       function(x) dbWriteTable(my_connection,  SQL("my_table"), x), append = TRUE)

If you do have access to the database console or admin UI it might be faster to import the data that way.

A similar solutions is suggested here in SQL.

Till
  • 3,845
  • 1
  • 11
  • 18
  • Thank you! Do you remember if this was the error you were getting? Error in result_insert_dataframe(rs@prt, values): nanodbc/nanodbc.cpp:1587 : HY008 : Operation canceled – stats_noob Mar 10 '22 at 01:36
  • 1
    I think it was something similar. When I read the error message in your question, it reminded me of that situation. It was a long time ago, though. Also, the error message might vary depending on the SQL server used. – Till Mar 10 '22 at 01:38
  • Btw : I don't think that the problem might be related to size limits ... e.g. my_table_1 = my_table[1:1000,] was able to be uploaded ... but my_table_2 = my_table[4000:5000,] was not able to be uploaded. "my_table_1" and "my_table_2" are the same size ... but because one of these tables was uploaded and the other table was not... this still leads me to believe that problem might not be in the table size, but rather some rows not being accepted. I would be curious to hear your thoughts! – stats_noob Mar 10 '22 at 01:38
  • 1
    you can try to go through the 4000 to 5000 range in smaller steps. If the issues stems from certain rows you should be able to narrow it down that way. Maybe 1000 rows at a time is too big a chunk size for some sections, try chunk sizes of 500, 100, ... instead. – Till Mar 10 '22 at 01:58
  • This is exactly the idea I had ... trying to narrow down the rows that are causing the problem. I was hoping that there might be a better way to solve this problem...or a way to somehow "force" everything on to the server.... – stats_noob Mar 10 '22 at 03:01
0

Your database columns all have a max length of 255 characters. dbWriteTable will fail if any of your character vectors contain strings longer than that. Use lapply(my_table, function(x) max(nchar(x))) to find out.

If that doesn't help, split the data.frame into chunks and use purrr::map in combination with purrr::possibly to loop over them. Then check the output to see which chunks failed, cut those into smaller chunks and try again.

BjaRule
  • 171
  • 5