0

Basic setup is that I connect to a database A, get some data back to R, write it to another connection, database B.

The database is SQL_Latin1_General_CP1_CI_AS and I'm using encoding = "windows-1252" in connection A and B.

Display in RStudio is fine, special characters show as they should.

When I try to write the data, I get a "Cannot insert the value NULL into column".

I narrowed it down to at least one offending field: a cell with a PHI symbol, which causes the error.

How do I make it so the PHI symbol and presumably other special characters are kept the same from source to destination?

conA <- dbConnect(odbc(), 
                Driver = "ODBC Driver 17 for SQL Server",
                Server = "DB",
                Database = "serverA",
                Trusted_connection = "yes",
                encoding = "1252")

dbWriteTable(conB,SQL("schema.table"),failing_row, append = T)
#This causes the "cannot insert null value" error
jarlh
  • 42,561
  • 8
  • 45
  • 63
  • SQL Server usually gives this message when the row you are inserting contains a NULL value in a column that can not contain NULL. Please can you confirm that `failing_row` contains no null values? Or that `schema.table` in database B does not have any columns that are non-null-able? – Simon.S.A. Dec 17 '21 at 20:41
  • Hey Simon, thanks for the reply. The error I get is specifically that one column in the data frame is failing. I've narrowed down the row to failing_row, and that row has a character string in the failing column. That string contains a unicode phi character which is visible in the original database and shows up fine in R. When I remove this character, I can write failing row just fine. – user2932481 Dec 18 '21 at 21:20
  • You say "The database is SQL_Latin1_General_CP1_CI_AS" but you also refer to "database A" and "database B". Do both databases have the same encoding? You can use `SELECT SERVERPROPERTY('Collation')` to check the encoding. – Simon.S.A. Dec 20 '21 at 20:19
  • Yep, same encoding! Database B is in Azure, if that changes anything. – user2932481 Dec 28 '21 at 07:51

1 Answers1

0

I suggest working around this problem without dbplyr. As the overwhelming majority of encoding questions have nothing to do with dbplyr (the encoding tag has 23k questions, while the dbplyr tag has <400 questions) this may be a non-trivial problem to resolve without dbplyr.

Here are two work-arounds to consider:

Use a text file as an intermediate step

R will have no problem writing an in-memory table out to a text file/csv. And SQL server has standard ways of reading in a text file/csv. This gives you the added advantage of validating the contents of the text file before loading it into SQL.

Documentation for SQL Server BULK INSERT can be found here. This answer gives instructions for using UTF-8 encoding: CODEPAGE = '65001'. And this documentation gives instructions for unicode: DATAFILETYPE = 'widechar'.

If you want to take this approach entirely within R, it will likely look something like:

write.csv(failing_row, "output_file.csv")

query_for_creating_table = "CREATE TABLE schema_name.table_name (
    col1 INT,
    col2 NCHAR(10),
)"
# each column listed along with suitable data types

query_for_bulk_insert = "BULK INSERT schema_name.table_name
FROM 'output_file.csv'
WITH
(
    DATAFILETYPE = 'widechar',
    FIRSTROW = 2,
    ROWTERMINATOR = '\n'
)"


DBI::dbExecute(con, query_for_creating_table)
DBI::dbExecute(con, query_for_bulk_insert)

Load all the non-error rows and append the final row after

I have has some success in the past using the INSERT INTO syntax. So would recommend loading the failing row using this approach.

Something like the following:

failing_row = local_df %>%
  filter(condition_to_get_just_the_failing_row)

non_failing_rows = local_df %>%
  filter(! condition_to_get_just_the_failing_row)

# write non-failing rows
dbWriteTable(con, SQL("schema.table"), non_failing_rows, append = T)

# prep for insert failing row
insert_query = "INSERT INTO schema.table VALUES ("

for(col in colnames(failing_row)){
  value = failing_row[[col]]
  if(is.numeric(value)){
    insert_query = paste0(insert_query, value, ", ")
  } else {
    insert_query = paste0(insert_query, "'", value, "', ")
  }
}

insert_query = paste0(insert_query, ");")

# insert failing row
dbExecute(con, insert_query)

other resources

If you have not seen them already, here are several related Q&A that might assist: Arabic characters, reading UTF-8, encoding to MySQL, and non-Latin characters as question marks. Though some of these are for reading data into R.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41