Background:
DuckDB allows for direct querying for parquet files. e.g. con.execute("Select * from 'Hierarchy.parquet')
Parquet allows files to be partitioned by column values. When a parquet file is paritioned a top level FOLDER is created with the name of the parquet file and subfolders for the column values and these subfolders then contain the actual parquet data files. e.g. Hierarchy.parquet (folder) --> date=20220401 (subfolder) --> part1.parquet
Expected behavior
DuckDB to query partitioned AND unpartitioned parquet files.
Observed behaviour
DuckDB fails when querying partitioned parquet files and works with unpartitioned parquet files.
con.execute("Select * from 'Hierarchy.parquet'")
fails with
RuntimeError: IO Error: No files found that match the pattern "Hierarchy.parquet"
when Hierarchy.parquet is partitioned.
querying the underlying individuals datafiles works fine:
con.execute("Select * from 'Hierarchy.parquet/date=20220401/part1.parquet'")
Is there a way to query partitioned parquet files with DuckDB? Or is this a limitation/bug?