As far as I know, the DuckDB is columnar database and can process and store sparse data efficiently.
It is indeed a column-store database, but I haven't seen any technical information (especially not any benchmarks) that demonstrate that DuckDB is any more space-efficient at storing sparse tables (i.e. mostly NULL
) than other RDBMS, columnar or rowstore. Whereas many incumbent RDMBS will prove difficult for DuckDB to unseat: for example, MS SQL Server supports both Sparse Tables and Columnar-storage too; while Postgres users will gladly remind you that NULL
s are cheap there too.
DuckDB's main selling-point is that it's an in-proc columnar database - so it's ideal for (say) SQLite or JET (Red or Blue) users who need/want to stay on an in-proc solution but need columnar storage for their application.
So, would it be possible to use it as "tuple space" or "document database"? I don't expect to get top performance from DuckDB in such use case, good enough performance would be enough.
There going to be one huge table with type
, title
, author
, duration
, director
, note
, todo
columns, with index on each, storing following objects (10 millions of objects)
What you're describing there is not what I'd call a "document" store because you aren't storing a single document blob addressed by a single key - and you aren't parsing-and-rewriting blob data to edit it, on the contrary: your data is largely normalized already; I'd refer to your design as being best-suited for a wide column store-type database (like Apache Cassandra and Azure Storage Tables). However, wide-column-store DBs also tend to be schema-free: that is, columns are created on-the-fly to correspond to the member-properties of the entities you store in the table: there's no requirement to use defined columns, and there's no way to constrain data to use those columns either.
...whereas DuckDB is like all the other RDBMS: you can only INSERT INTO
columns you defined in your prior CREATE TABLE
/ALTER TABLE
statements - which necessarily restricts its appropriateness for wide-column-store scenarios: if you have a new entity property you want to store you'll have to run DDL statements like ALTER TABLE
- which often (but not always) require a full table-rebuild. In fact, DuckDB's documentation (as of July 2023) warns that ALTER TABLE
is not a command you'll want to be using very often...
https://duckdb.org/docs/sql/statements/alter_table.html
ALTER TABLE
[...]
- At the moment DuckDB will not allow you to alter a table if there are any dependencies. That means that if you have an index on a column you will first need to drop the index, alter the table, and then recreate the index. Otherwise you will get a “Dependency Error.”
ADD TABLE
[...]
- The new column will be filled with the specified default value, or
NULL
if none is specified.
I'm not a DuckDB expert (or even a user), but the fact they say new columns will be "filled" implies that adding new columns is somewhat expensive, whereas other RDBMS (especially traditional row-store DBs) can add new columns instantly without needing to "fill" them on-disk.
...which makes me think that DuckDB is less-suitable for wide-column-store-style databases; see, in your example, you use the type
property/column to indicate which other properties/columns are expected to exist (e.g. type: 'note'
implies a textual note
column exists, while type: 'book'
implies author
and title
columns) - so if you expect to oftenly add more possible types, with their associated columns, then DuckDB is probably not the best choice. (Though I do like how DuckDB supports CHECK
constraints, which you can use to enforce a schema even in a poor-man's-polymorphic table as in your example - though CHECK
constraints are a feature of all major RDBMS).
As a wry aside, I strongly suggest everyone just read Codd's paper, and grok some Relational Algebra - once you internalize it you'll never need nor want to use a denormalized store like Key/Value, Document, Wide-Column Stores, et cetera). (Anecdotally, the only people I've ever seen willingly move from RDBMS to Document-store were people who didn't want to learn SQL - or had no clue about indexes). (While I have yet to be wooed-over by a Document-store system, other DB types, such as Graph databases like Neo4j, or vector-databases, are fantastic - and key/value blob stores are also incredibly useful when used appropriately (e.g. S3) - my advice is to just steer-clear of schema-free "JSON"-document-style stores.