0

My Query

Should we expect slowness in retrieving data from storage within different Partitions in a specified period of time - say 1 hour - if the data in the Table Storage`Partitions within Table Storage` is expected to be very very very huge (say in millions)?

About My App

  1. My web app deals with receiving data for different signals from different devices.

  2. The frequency of data to be received from devices can be 1 min.

  3. The data thus received will be posted to Table Storage and displayed as and when received on a Dashboard.

  4. The data pertaining to a particular signal(s) within a selected period of time can also be queried to be displayed on page.

My problem

Currently the app is in testing and data comes in only whenever testing is happening. With this less amount of data, to query and fetch data from Table Storage it takes ~30 secs to fetch ~10,000 rows.

I have been reading here in different posts like Very Slow on Azure Table Storage Query on PartitionKey/RowKey List that says there is some delay in getting data from Table Storage.

So my query is

  1. When there are millions of data in a Table Storage \ within a Partition will a query to Table Storage make a complete table scan leading to heavy performance issue?

    1. One of my expected queries to retrieve data to display on my page is

    (((((((((((((PartitionKey eq 'D4AS1') or (PartitionKey eq 'D4AS2')) or (PartitionKey eq 'D4AS3')) or (PartitionKey eq 'D4AS4')) or (PartitionKey eq 'D4AS5')) or (PartitionKey eq 'D4AS6')) or (PartitionKey eq 'D4AS7')) or (PartitionKey eq 'D4AS8')) or (PartitionKey eq 'D4AS9')) or (PartitionKey eq 'D4AS10')) or (PartitionKey eq 'D4AS11')) or (PartitionKey eq 'D4AS133')) and (TimeReceived ge datetime'2018-02-21T23:53:40.4622407Z')) and (TimeReceived le datetime'2018-02-22T23:53:40.4622407Z') Should the above query be re framed for better performance? If so please suggest in what way it needs to be addressed?

    1. Whats the maximum delay we can expect on querying (simple\complex as above) Table Storage?
jAntoni
  • 591
  • 1
  • 12
  • 28
  • My assumption is that the query is bad and causing scans. Have you tried splitting it up into concurrent queries manually? – Cory Nelson Feb 23 '18 at 05:21
  • Is `TimeReceived` a row key? If not, you're doing a full partition scan. And since you specified multiple partition keys, you're doing multiple partition scans, if `TimeReceived` is just an additional property. – David Makogon Feb 23 '18 at 05:37
  • @CoryNelson Thanks . I didn't get `splitting it up into concurrent queries`. The above queries is dynamically formed - like from a input `List` of partition keys , a method frames query with Or\And Condition accordingly. Kindly elaborate on your suggestion. – jAntoni Feb 23 '18 at 05:37
  • @DavidMakogon `TimeReceived' is not the row key. Will changing the query to `Timestamp` property of `Table Storage` instead of the current `TimeReceived` help? Or do we have any other way to improve the performance? Please suggest. – jAntoni Feb 23 '18 at 05:41
  • `Timestamp` is not a rowkey. You'd need to store your `TimeReceived` property in the row key and try again. But... I can't help you design your table, especially without knowing all the types of queries you'll execute. The best thing you can do is look at the [Table Storage Design Guide](https://learn.microsoft.com/en-us/azure/cosmos-db/table-storage-design-guide) to better understand how tables and related queries work. – David Makogon Feb 23 '18 at 05:44

1 Answers1

1

When there are millions of data in a Table Storage \ within a Partition will a query to Table Storage make a complete table scan leading to heavy performance issue?

If your query include PartitionKey, then it will not do table scan. It will however do a partition scan (unless you have issued a point query using both PartitionKey and RowKey).

One of my expected queries to retrieve data to display on my page is (((((((((((((PartitionKey eq 'D4AS1') or (PartitionKey eq 'D4AS2')) or (PartitionKey eq 'D4AS3')) or (PartitionKey eq 'D4AS4')) or (PartitionKey eq 'D4AS5')) or (PartitionKey eq 'D4AS6')) or (PartitionKey eq 'D4AS7')) or (PartitionKey eq 'D4AS8')) or (PartitionKey eq 'D4AS9')) or (PartitionKey eq 'D4AS10')) or (PartitionKey eq 'D4AS11')) or (PartitionKey eq 'D4AS133')) and (TimeReceived ge datetime'2018-02-21T23:53:40.4622407Z')) and (TimeReceived le datetime'2018-02-22T23:53:40.4622407Z') Should the above query be re framed for better performance? If so please suggest in what way it needs to be addressed?

As mentioned by Steve in the answer you have linked, this query is not really optimized. You should create multiple queries and fire them in parallel. Once the results from all queries come back, you should combine them on the client side and present it to your user.

Whats the maximum delay we can expect on querying (simple\complex as above) Table Storage?

From this link, maximum time allotted for execution of a query is 5 seconds and maximum time allotted for the query is 30 seconds. From this link:

A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation.

Note that the total time allotted to the request for scheduling and processing the query is 30 seconds, including the five seconds for query execution.

Gaurav Mantri
  • 128,066
  • 12
  • 206
  • 241