I am not completely new to AWS. I need a design suggestion for my architecture .
I have to load say 50 files from different sources every month and these files are very small, less than 500 MB per file .
I am reading from S3 and loading to Delta table using Databricks and then exposing them through DB SQL.
- Do I really have to worry about partitioning in my delta table since its not a big file
- Is there any way to partition a delta table by size ?
- Not sure I understand completely how a vacuum/ optimize will run on a non partitioned delta table and so I am hoping that I will do optimize vacuum once monthly after my load.
does this sound correct ?
Please suggest from your experience/implementations
Sankar