I have a Pandas dataframe with a column containing dictionary values. I'd like to query this dataframe using DuckDB and convert the result to another dataframe, and have the type preserved across the query.
DuckDB has the MAP
data type which looks like a good match for a dictionary, but when selecting the column it's turned into a VARCHAR
, and results in a string-type column if I convert back to a dataframe.
Is there some way to preserve the type, or at least a good way to convert the string back to a dictionary when generating the new dataframe?
>>> # Create a dataframe with a column containing a dictionary
>>> df = pd.DataFrame([[{'some': 'dict', 'with': 'stuff'}]], columns=['mycol'])
>>> df
mycol
0 {'some': 'dict', 'with': 'stuff'}
>>> type(df['mycol'][0])
<class 'dict'>
>>> # Select that column using DuckDB - it becomes a VARCHAR
>>> duckdb.query('select mycol from df')
---------------------
-- Expression Tree --
---------------------
Subquery
---------------------
-- Result Columns --
---------------------
- mycol (VARCHAR)
---------------------
-- Result Preview --
---------------------
mycol
VARCHAR
[ Rows: 1]
{'some': 'dict', 'with': 'stuff'}
>>> # Converting the query result to another dataframe results in a string-type column
>>> df2 = duckdb.query('select mycol from df').to_df()
>>> df2
mycol
0 {'some': 'dict', 'with': 'stuff'}
>>> type(df2['mycol'][0])
<class 'str'>
>>> # An explicit cast to MAP doesn't work
>>> duckdb.query('select CAST(mycol as MAP(VARCHAR, VARCHAR)) from df')
---------------------
-- Expression Tree --
---------------------
Subquery
---------------------
-- Result Columns --
---------------------
- CAST(mycol AS MAP<VARCHAR, VARCHAR>) (MAP<VARCHAR, VARCHAR>)
---------------------
-- Result Preview --
---------------------
Conversion Error: Conversion Error: Unimplemented type for cast (VARCHAR -> MAP<VARCHAR, VARCHAR>)