I am really struggling to understand how Dynamo / ElasticSearch should be used to support AWS data lake efforts (Metadata / Catalogs). It seems as though you would log the individual S3 locations of your zip archives for your sources in Dynamo and any additional metadata / attributes you would like to search by in ES. If that is correct, how would you use the two together to support that. I tried to find more detailed information about how to properly pair the two together, but have been unsuccessful. Any information / documentation that others have would be great. Good chance I am overlooking some obvious examples / documentation.
What I am imagining is something like the following:
- User could search for metadata / attributes in ES that would point to the high-level S3 buckets / partitions that match.
- The search in DynamoDB would be against the part of the key (Partition / bucket) from the ES result
- The search would most likely result in many individual objects / keys that could then be processed, extracted, etc.