0

I am querying data from BQ. I have to insert it to SQL Server 2008.

pseudocode:

# get results from bigquery

client = bigquery.Client()
result = client.query(sql).result()

# create parametrized insert sql statement

sql = 'INSERT INTO [SANDBOX].[dbo].[table_stg] VALUES ((?),(?),(?),(?),(?),(?),(?),(?),(?),(?))'

data = [(2016, 'STRING', 'STRING', 'STRING', 'STRING', 'STRING', INT, 'STRING', '09/28/2015', '09/25/2016'), 
(2016, 'STRING', 'STRING', 'STRING', 'STRING', 'STRING', INT, 'STRING', '09/28/2015', '09/25/2016')]

# target table schema

[varchar](50) NULL,
[varchar](50) NULL,
[varchar](50) NULL,
[varchar](50) NULL,
[varchar](50) NULL,
[varchar](50) NOT NULL,
[int] NULL,
[varchar](50) NULL,
[datetime] NULL,
[datetime] NULL


# insert into sql server with fast_executemany

cursor.fast_executemany = True
cursor.executemany(sql, data)
conn.commit()
cursor.close()
conn.close()

So when I run cursor.executemany(sql, data) with cursor.fast_executemany = True, I receive error:

Invalid character value for cast specification.

String data, right truncation: length 44 buffer 20, HY000

The code works perfectly without cursor.fast_executemany = True, yet it executes each insert with single commit meaning 8 hours to transfer 2 milion rows.

Thanks in advance for help. Also, if there is some filtering mechanism I could sue to split into fast_execute and normal execute I would be super gratefull.

EDIT: edited the question as per comments

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
mkylfnur
  • 1
  • 1
  • Why have you tagged both SQL Server and BigQuery? They're completely different products. – Thom A Jun 04 '23 at 09:25
  • Because I get the results from BigQuery by >client = bigquery.Client() >result = client.query(query).result() And maybe thats valid to solving my problem. – mkylfnur Jun 04 '23 at 09:30
  • possibly related: https://stackoverflow.com/q/61204788/2144390 – Gord Thompson Jun 04 '23 at 15:18
  • Thanks, thats a good read, I already tackled issues with empty columns and datetime. The only problem (i think) I have is with double quotes surrounding strings with single quotes inside as some kind of bq apoi - python - tsql 2008 conclict. Or maybe something else/more. – mkylfnur Jun 04 '23 at 20:01
  • Single quotes within a string parameter value will not be a problem for a well-formed parameterized INSERT query, e.g., `crsr.executemany("INSERT INTO my_table (my_column) VALUES (?)", [("my'problem",), ("what'ever",)])`. Consider adding a [mcve] that can replicate the issue. – Gord Thompson Jun 04 '23 at 23:54
  • Edited my question as per your suggestion. – mkylfnur Jun 05 '23 at 11:21

1 Answers1

0

The related question here illustrates how fast_executemany = True affects the way that empty string parameters are interpreted for datetime columns. With fast_executemany = False, empty strings are interpreted as 1900-01-01 00:00:00. With fast_executemany = True, empty string parameters cause the "Invalid character value for cast specification" error.

The same appears to be true for aa/bb/yyyy string parameters. That format is potentially ambiguous (Is it mm/dd/yyyy or dd/mm/yyyy?) and SQL Server makes assumptions under fast_executemany = False that it does not make under fast_executemany = True.

This does not throw an error for a column defined as d1 [datetime]

cursor.fast_executemany = False
cursor.executemany(
    "INSERT INTO [SANDBOX].[dbo].[table_stg] (d1) VALUES (?)",
    [("09/28/2015",)]
)

but this fails

cursor.fast_executemany = True
cursor.executemany(
    "INSERT INTO [SANDBOX].[dbo].[table_stg] (d1) VALUES (?)",
    [("09/28/2015",)]
)

However, this works:

cursor.fast_executemany = True
cursor.executemany(
    "INSERT INTO [SANDBOX].[dbo].[table_stg] (d1) VALUES (?)",
    [("2015/09/28",)]
)

So, either reformat the strings to yyyy/mm/dd or parse them into a datetime object

cursor.fast_executemany = True
cursor.executemany(
    "INSERT INTO [SANDBOX].[dbo].[table_stg] (d1) VALUES (?)",
    [(datetime.datetime.strptime("09/28/2015", "%m/%d/%Y"),)]
)
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
  • Thanks Gord for this detailed answer. I am testing it right now. Can you recommend me some reading material so I could create a data validator compliant for fast_executemany? – mkylfnur Jun 05 '23 at 21:04