3

I am using Duckdb to insert data by Batch Insert

While using following code

conn.execute('INSERT INTO Main SELECT * FROM df')

I am getting following error

Invalid Input Error: Failed to cast value: Unimplemented type for cast (VARCHAR -> NULL)

I tried using

df.fillna('N/A')

to fill any null values to avoid the error but still I am getting the same error. How to fix this?

  • 1
    I ran into the same problem today. After some trial and error, I believe it is due to how DuckDB infers column types. I had though DuckDB would use the Pandas dtypes as help, but it appears DuckDB is independently inferring dtypes as if the dataframe were a .CSV file. This problem impacted a column for me that wasn't completely null but mostly null. When I used .fillna("null"), the error resolved. I also sorted the dataframe by the column with the issue and then loaded the data - no issues again. I hope this helps. – Jeff D. White Feb 23 '23 at 21:32

1 Answers1

10

From the docs: By default, DuckDB reads the first 100 lines of a dataframe to determine the data type for Pandas "object" columns. If those 100 lines are null, it might guess the wrong type.

Override this behavior with:

# example setting the sample size to 100000
duckdb.default_connection.execute("SET GLOBAL pandas_analyze_sample=100000")

If using a named connection, replace duckdb.default_connection with your specific DuckDB connection variable.

Thanks to Jeff's comment for pointing me in the right direction on this one.

  • 1
    I found myself here again after a few months - thank you for finding a cleaner solution. I used the following in case this is helpful: duckdb_connection = duckdb.connect(), followed by duckdb_connection .execute("SET GLOBAL pandas_analyze_sample=100000") – Jeff D. White May 10 '23 at 22:02