How does dynamo DB stores data?

Question

As Dynamodb store the data in form of key value pairs, where key is the sort of primary key and value is the data associated with it.I want to know whether dynamo db actually understands the value(json)?By value I mean the json object associated with a key(a row in RDBMS).Does dynamo db understands that there are some attributes and there are some value of attributes that it is gonna store?

Context : I have a person table in dynamo db that has different attributes, say 100, and one of them is age, now suppose there is some requirement that i want to get some records based on age. If dynamo db go each entry one by one and then read its record and suppose each record is pretty large, then does dynamo db reads entire data of record or can it access only age attribute in constant time regardless of size of the record?

DynamoDB is going to read all your items if you scan the table and filter on age. If you want this to be efficient, create a GSI with age as the GSI sort key then your query can include `age BETWEEN 10 AND 20` in your `KeyConditionExpression`. — jarmod, Nov 29 '20 at 20:41
yeah,i will do that.For question above, I just want to know that whether in each item, will it read only age attribute?Once it has record,can it read age directly?Or does it need to read entire record to read age?Like a person has different record,given that it is at record x,can it read x's age directly or does it need to read entire record to know where is age in x's attribute? — rahul sharma, Nov 29 '20 at 20:44
If the attribute is not keyed then DynamoDB reads the item before it can filter on a given given value of that attribute afaik. If it's keyed then DynamoDB can simply use the index (which contains the value). — jarmod, Nov 29 '20 at 21:28

score 7 · Accepted Answer · answered Nov 30 '20 at 01:33

7

Does dynamo db understands that there are some attributes and there are some value of attributes that it is gonna store?

No, it does not.

DynamoDB is a "wide column" style of NoSQL database. While the schema isn't defined beyond the primary key at table construction time, the querying abilities are limited to primary keys or secondary indexes. Creating Global Secondary Indexes allows you to query against other attribute values. Local Secondary Indexes can be queried too, but they're a bit of an odd duck. See here for a good comparison of the two secondary index types.

If your needs do include querying inside the attributes, check out some of the "document-oriented" style of NoSQL databases, of which MongoDB is the one most people think of. If you're already embedded in the AWS ecosystem and don't want to break out of it, AWS offers DocumentDB as a MongoDB-compatible service managed by AWS.

Wide-column and document-style data stores have different pro's & cons. Generally-speaking, the wide-column approach is better for extreme scalability at consistent cost & speed, whereas the document-oriented approach gives more flexibility as your data access patterns evolve over time. Choose the one that suits your needs the best.

answered Nov 30 '20 at 01:33

Peter Wagener

2,073
13
20

actually i was reading a blog which says that, if we query on non index based column then it is faster in SQL as compared to NO-SQL.Becasue SQL can goto that particular column and scan all row and check which satisfies creteria. But in case of NO SQL ,it has to read entire record for a row as it cannot read a non-index based column by jsut going to particular column.OS i got confusion – rahul sharma Nov 30 '20 at 17:01
1

It's a choice between scalability and flexibility. DynamoDB expects you to be able to express most of your access patterns up front, but don't worry about the structure of your data. By doing this you get from doing that is consistent query times _regardless of how your data grows_. Relational databases offer the opposite: describe the structure of your data first, but figure out the access patterns later. This lets you introduce new access patterns as you see fit, but as your data grows the access time slows down. – Peter Wagener Dec 01 '20 at 11:48
@PeterWagener DynamoDB has to read the entire record, so the access time increases when the size of an average record grows. SQL could just read the column only and have consistent access time. For un-index field, both SQL and NoSQL need to scan the entire column of the primary key, so there is no difference in that part, which is likely to dominate the access time anyway. For Indexed field, only the first part matters, so NoSQL is better in either case? – Zack Light Dec 01 '22 at 12:38

score 1 · Answer 2 · answered Nov 29 '20 at 21:19

1

You cant do that. Whole item is always retrievied, and that's what you pay for. What you can do:

use GSI, specify only attributes you need, this way you will only pay for those attributes
use ProjectionExpression, it will return only specified attributes from db, so you'll have smaller network usage. But it's applied after actual read from db. So you will still pay for retrieving whole item

answered Nov 29 '20 at 21:19

karjan

936
1
7
17

ohh.So if i dont have create GSI and the attribute i am interested in is not primary key, then there is no way to read those records? Only way to is to do full scan and add filtering logic in application layer?Is that correct?I understand what I am asking is not practical but just wanting to know dynamo db more.Is my understanding correct? – rahul sharma Nov 29 '20 at 21:32
Not exactly. You always need to pay for reading whole item. In addition if you scan, you read the whole table and you pay for it. And ProjectionExpression is on DynamoDB layer, your application will receive only specified properties. But you pay for whole item – karjan Nov 30 '20 at 08:07

How does dynamo DB stores data?

2 Answers2

Linked