4

I have a Quick Sight dashboard pointed to Athena table. Now I want to schedule to refresh SPICE every hour. As per documentation, Refreshing imports the data into SPICE again, so the data includes any changes since the last import.

If I have a 2TB dataset in Athena and every hour new data added in Athena. So QuickSight will load 2TB every hour to find the delta? if yes, it will increase the Athena cost. Does QuickSight query on Athena to fetch data?

lucy
  • 4,136
  • 5
  • 30
  • 47

2 Answers2

3

As of the date of answering (11/11/2019) SPICE does in fact perform a full data set reload (i.e. no delta calculation or incremental refresh). I was able to verify this by using a MySQL data set and watching the query log while the refresh was occurring.

The implication for your question is that you would be charged every hour for Athena to query the 2TB data set.

If you do not need the robust querying that Athena provides, I would recommend pointing QuickSight to the S3 data directly.

mjgpy3
  • 8,597
  • 5
  • 30
  • 51
0

My data is in parquet format. I guess Quicksight does not support a direct query on s3 parquet data.

  • Yes, we need to use Athena to read the parquet.

When you say point QuickSight to S3 directly, do you mean without SPICE?

  • Don't do it, it will increase the Athena and S3 costs significantly.

Sollution:

  • Collect the delta from your source.
  • Push it into S3 (Unprocessed data)
  • Create a lambda function to pre-process the data (if needed)
  • Set up a trigger for lambda.
  • Process the data in lambda, and convert the data to parquet format with gzip compression.
  • Push the data into S3 (Processed data)
  • Remove the unprocessed data from S3 or set up an S3 lifecycle to maintain it.
  • Also create a metadata table with primary_key and required fields.
  • S3 & Athena do not support update records, so each time you push the data it will be appended to the old data, and the entire data will be scanned.
  • Both S3 and Athena follow the scan-first approach, so even though you are applying a filter it will scan the entire data before it applies the filter.
  • Use the metadata table to remove the old entry and insert the new entry.
  • Use partitions wherever possible to avoid scanning the entire data.
  • Once the data is available, configure Quicksight data refresh to pull the data into SPICE.

Best practice:

  • Always go with SPICE (Direct queries are expensive and have high latency)
  • Use the incremental refresh wherever possible.
  • Always use static data, do not process the data for each dashboard visit/refresh.

Increase your Quicksight SPICE data refresh frequency

GURU RAJAN
  • 11
  • 2