1

Sources indicate that DynamoDB is a key/value store, document store, and/or wide-column store:

At the core, DynamoDB is a key/value store.

If the value stored is a document, DynamoDB provides some support for working with the underlying document. Even Amazon agrees. So far, so good.

However, I've seen some claims that DynamoDB is actually a wide-column store (1, 2, 3, etc.). This seems odd to me, since as I understand it, a wide-column store would technically require a different data storage model.

Is it appropriate to consider DynamoDB to be a wide-column store?

rinogo
  • 8,491
  • 12
  • 61
  • 102

3 Answers3

3

In How do you call the data model of DynamoDB and Cassandra? I asked a similar question. I noted that both Cassandra and DynamoDB, which have a very similar data model, are sometimes called "wide-column store" because of its sort key feature:

In DynamoDB (and in Cassandra), items are stored inside a partition contiguously, sorted by the so-called "sort key". To locate an item, you need to specify its partition key, and inside that partition, specify its sort key. This is exactly the two-dimensional key-value store described in Wikipedia's definition of wide-column store https://en.wikipedia.org/wiki/Wide-column_store.

The historic evolution of a wide-column store into a DynamoDB-like one is easier to understand in the context of Cassandra, whose data model is more-or-less the same as DynamoDB's: Cassandra started its life as a real "wide column store": Each row (called "partition") had an unlimited number of unrelated columns. Later, CQL was introduced which added the concept of a "clustering key" (this is Cassandra's equivalent of DynamoDB's sort key), and now each partition was no longer a very long list of unrelated columns - instead it became a very long (and sorted) list of separate items. I explained this evolution in my answer https://stackoverflow.com/a/47127723/8891224 comparing Cassandra's data model to Google Bigtable, which was the quintessential wide-column store.

Nadav Har'El
  • 11,785
  • 1
  • 24
  • 45
2

How does Wikipedia define a wide-column store?

https://en.wikipedia.org/wiki/Wide-column_store opens with:

A wide-column store (or extensible record store) is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store.

DynamoDB has tables, rows (called items), and columns (called attributes). The names and format can vary from row to row (except for the primary key).

I think most wide-column stores define their table's schema centrally while DynamoDB lets each item define its own schema.

A simple key-value store would only let you look up by a key value. DynamoDB gives you a lot more choices.

At the end of the day this nomenclature is just our collective attempt to group things into similar buckets. There's naturally going to be some fuzzy edges.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
1

To add up to the the great answer by Nadav, be careful with considering DynamoDB as wide column datastore...

Of course you can use wide-column-datastore patterns with DynamoDB with key range queries for instance (but the sortKey must be built smartly, nothing can prevent you from errors) but there is a hard limit to it, and it is the item size of a row that is limited to 400KB. This is great for most cases, but very narrow if you want to put, say hundreds of columns of data. And that is generally what you want to do with wide column datastores. Going around the limit is hell to put simply, you will add other tables and joins to compensate.

If you are really interested with using a columnar datastore on AWS, I personally would use AWS Keyspaces for that, it doesn't have the limits of DynamoDB. It will require you to design a database schema, but if you have so many columns, I see it as a plus. CQL is also better than DDB query API.

zenbeni
  • 7,019
  • 3
  • 29
  • 60
  • You're right - if you want to use DynamoDB as a "wide column" database you'll have to somehow use the sort-key concept to do that properly and insert them across many items in the same partition - you can't just stick all the columns in a single "item". I don't understand the suggestion to use AWS Keyspaces. Isn't this just Cassandra, with exactly the same data model of sort keys (called "clustering keys" in Cassandra) and can no longer use the old-style partition-just-gazillion-unrelated-columns API ("thrift")? – Nadav Har'El Oct 22 '22 at 08:56
  • It is a severless version of Cassandra with less CQL support that uses auto-scaling partitions like dynamodb. So it is not really Cassandra, your logical partitions are not tied to your physical partitions as you don't manage them (and AWS manages partition duplication to support increased load on RU/WU spikes). The data-model / sort keys are similar to Cassandra. But the runtime is way different, and the usage constraints too, but I see it as a plus, managing Cassandra cluster yourself is hell, no tombstones, no JVM tuning etc... – zenbeni Oct 22 '22 at 10:24
  • zenbeni - but all of this doesn't say you can have in AWS Keyspaces a single item with a huge number of cells. First it's hard to do since there is a schema, but even if you work around it (e.g., use a map column) a single huge item won't work - you need to use a partition with multiple rows in it. That's exactly the same "problem" you have with DynamoDB. So switching to AWS Keyspaces won't solve anything. – Nadav Har'El Oct 22 '22 at 11:03
  • I don't think I understand you. AWS Keyspaces can handle millions of columns, with schema support which is good for data efficiency (fixed max size of column data, CQL support), of course you have a big schema, but it is not complex. One row in any partition can easily exceed 400ko without troubles, thus bypassing DDB limit that prevents you from putting too many things in a row. – zenbeni Oct 22 '22 at 11:28
  • The 400KB limit on one item (CQL "row") in DynamoDB is arbitrary. I'm not surprised AWS Keyspaces doesn't have this specific limit. But can you have a 400 MB single row? I don't see how. First of all because of the schema, I assume you mean the very large row has a very large map inside it. The CQL API cannot support a 400 MB map because, among other things, you have no API to page through it when reading it. – Nadav Har'El Oct 22 '22 at 13:14
  • Don't mistake rows for cells. A row has cells (one cell by table column) and a row item cannot exceed 400ko in dynamodb (all cells of one row). It also includes table column definition (in fact you have less than 400ko as available space for the row, as the definition takes some space), and secondary indices (hidden occupied space that you have to estimate by hand). No such constraints in AWS Keyspaces. – zenbeni Oct 22 '22 at 15:21
  • Again, I agree that the specific constraint of a 400KB row does not exist in AWS Keyspace but I maintain that you *cannot* have a 400 MB row there. If not for any other reason than for the fact you have no way to read this 400 MB row (CQL doesn't have any way to ask to read a partial collection or string, or to page through one). – Nadav Har'El Oct 23 '22 at 15:06
  • Really? Just fetch by partitioKey, sortKey, say some columns you want (not all if it exceeds the size of allowed CQL payload size), then run another query to fetch other columns, so the limit of CQL payload size does not block you. We chunk CQL queries into multiple ones to exceed the allowed size and it works very well so far. I don't think you tried a lot of things on AWS Keyspaces. – zenbeni Oct 23 '22 at 16:10
  • yes, by fetching individual columns you are not limited to any specific size like 400KB, but how would you get to 400 MB this way? You'll have a thousand named columns? This isn't what normally happens. Normally you get exceedingly-large items by having a very large collection column, or a very large string/blob. Those you cannot retrieve pieces of (well, again technically you can retrieve an individual collection item if you know it exists, but you don't know which items a collection has). – Nadav Har'El Nov 15 '22 at 07:45