How to Speed up AWS Timestream query performance?

Question

The AWS timestream Database is queried using grafana API and results are shown on dashboards

While everything works well when we query for less data points but my queries would fail when I query too much data i.e, of 1-2 months for 100 or more dimensions. the query would fail while fetching data.

are you aggregating your data in your query? if you get back every single point of data for a few months and you have a sampling rate of 1 data point per second for example, then that would be many data points. You need to aggregate (group by) your data. — kaptan, Apr 26 '23 at 21:52

score 2 · Answer 1 · answered Feb 09 '23 at 13:50

As stated in the AWS Timestream docs, there are some best practices that, if you follow, your queries will be quite fast. I can vouch that, obeying those rules, you can return a huge data-set (4M records) under 40s.

Adding to those guides beneath, I would also suggest avoiding high cardinality dimensions. I explain: IF you have a dimension, like time, or something that grows indefinitely, the indexes on this dimension will get out of hand and, soon, your query will be too slow to be useful.

The original document can be found here (There are some not-pasted links in the list, consult the doc).

Following are suggested best practices for queries with Amazon Timestream.

Include only the measure and dimension names essential to query. Adding extraneous columns will increase data scans, which impacts the performance of queries.

Where possible, push the data computation to Timestream using the built-in aggregates and scalar functions in the SELECT clause and WHERE clause as applicable to improve query performance and reduce cost. See SELECT and Aggregate functions.

Where possible, use approximate functions. E.g., use APPROX_DISTINCT instead of COUNT(DISTINCT column_name) to optimize query performance and reduce the query cost. See Aggregate functions.

Use a CASE expression to perform complex aggregations instead of selecting from the same table multiple times. See The CASE statement.

Where possible, include a time range in the WHERE clause of your query. This optimizes query performance and costs. For example, if you only need the last one hour of data in your dataset, then include a time predicate such as time > ago(1h). See SELECT and Interval and duration.

When a query accesses a subset of measures in a table, always include the measure names in the WHERE clause of the query.

Where possible, use the equality operator when comparing dimensions and measures in the WHERE clause of a query. An equality predicate on dimensions and measure names allows for improved query performance and reduced query costs.

Wherever possible, avoid using functions in the WHERE clause to optimize for cost.

Refrain from using LIKE clause multiple times. Rather, use regular expressions when you are filtering for multiple values on a string column. See Regular expression functions.

Only use the necessary columns in the GROUP BY clause of a query.

If the query result needs to be in a specific order, explicitly specify that order in the ORDER BY clause of the outermost query. If your query result does not require ordering, avoid using an ORDER BY clause to improve query performance.

Use a LIMIT clause if you only need the first N rows in your query.

If you are using an ORDER BY clause to look at the top or bottom N values, use a LIMIT clause to reduce the query costs.

Use the pagination token from the returned response to retrieve the query results. For more information, see Query.

If you've started running a query and realize that the query will not return the results you're looking for, cancel the query to save cost. For more information, see CancelQuery.

If your application experiences throttling, continue sending data to Amazon Timestream at the same rate to enable Amazon Timestream to auto-scale to the satisfy the query throughput needs of your application.

If the query concurrency requirements of your applications exceed the default limits of Timestream, contact AWS Support for limit increases.

score 0 · Answer 2 · answered Jul 24 '23 at 06:03

Assuming you've followed best practices in both your table design, and query design, a huge influencer in the performance of your queries is the store retention property. In this example, data is in the memory store for 1 day, and then the magnetic store for the remaining 89 days:

This means that queries that are made over the last 24 hours should be very fast, whereas when queries span days or months, you can expect a large performance degradation.

So if you need e.g. fast performance for queries within a week, but can accept slower performance with a larger range than that, then consider increasing the memory store retention. Although note that it comes at significant cost. At the time of writing according to their pricing, whilst Magnetic store costs 3.0 cents per GB stored per month, Memory store costs 3.6 cents per GB stored per hour!

How to Speed up AWS Timestream query performance?

2 Answers2