I see lots of questions, articles, and answers on using DynamoDB (NoSQL) database to store metadata for an S3. I actually have more experience using relational databases than NoSQL. Wouldn't a "RELATIONAL" database be the best choice for the metadata because of all the different parameters (metadata) (relationships) you might want to search for an image stored in S3. This is what I would think. Also, when I look at this link, it seems DynamoDB is a bit problematic.
Asked
Active
Viewed 485 times
0
-
1As with almost any "SQL v NOSQL" question, I think the answer is that it depends on what query patterns the choice needs to support. – Ben Thul Jan 26 '22 at 18:32
-
We would store documents and images on S3 and need to be able to search for anywhere from 4 to 20 different "metadata" parameters from company name, to district, county, particular individuals, etc. We could store well over 70 TB of data with the total number of files on S3 at say >300 Million. S3 will be bucket and keys as image type(bucket)\year\orgid. These "relationships" stay within a bucket but will transcend years and org ID's. We can expect 50 buckets so the 300 Million would break down to say 6Million files within any bucket and search. – Michael Barber Jan 27 '22 at 17:07
-
Just the ambiguity of your answer ("need to be able to search for anywhere from 4 to 20 different parameters) suggests (to me anyways) that nosql is not a good fit. Generally speaking, nosql excels when the query patters are known a priori and are not likely to change. That said, if some plurality of these parameters are always known and are highly selective (e.g. one or more of UserID, timestamp), nosql might still be a good fit. I'd suggest going through the data modeling exercise and seeing what happens. – Ben Thul Jan 27 '22 at 17:51
-
So that is what I thought as well. No there will be no common query parameters. Each bucket unfortunately, will have different things to query. There are also "lots of relationships" between each set of images in the buckets. The most common things across the whole system and storage are already reflected in the bucket key names. After that, it all goes off the rails. – Michael Barber Jan 27 '22 at 17:57
-
Even your last statement doesn't make it cut and dry. If you treat each bucket as a separate "table", is there a low number of queries you'd want to do per bucket? – Ben Thul Jan 27 '22 at 22:16