0

I have a table with the following field in my DB .

Date/Time , Device ID , Sensor 1 Value , Sensor 2 Value .

This is part of a IOT project where e sensor will send data to the DB every second for data analytics purposes

We initialy set Device ID as an index foreseeing client will use the data from some sensor only.

However, we found that our client mainly requested data based on date (e.g. Data for whole of 2013).

Due to the size of e records , it take days to just run the Select * filtered by datetime query for 1 year of data (Estimated 10 Million Entries).

My question is , does it make sense to index date/time to speed up query since date time field is sorted during insertion?

If so, any suggestion to speed up such a query ?

Thank you.

sean
  • 717
  • 2
  • 9
  • 23
  • does the table have a clustered primary key or index? that is what your data is "sorted" by. By default there is no sort order, so even if you insert your records chronologically, there is no default sort order unles you have a clustered key or index. – Jeremy Oct 02 '15 at 02:01
  • My suggestion would be: adding a pagination to the datasource then you will only need a partition of data instead of fetching data of the whole year, also, adding indexes will take up much more memory due to the large amount of data. – User2012384 Oct 02 '15 at 02:04
  • @Jeremy we do not have a primary key or index. As there are no unique fields in our dataset. yes you are right we are inserting chronologically. – sean Oct 02 '15 at 02:04
  • @sean check out this post http://stackoverflow.com/questions/20050341/when-no-order-by-is-specified-what-order-does-a-query-choose-for-your-record If you are going to order or select by a column there should be an index – Jeremy Oct 02 '15 at 02:24

2 Answers2

2

Well, it sort of makes sense. The "sort of" is because you are fetching a large list of value -- for an entire year. The index should help, because the data are all on adjacent data pages, which should be loaded into the cache. However, there is still a large number of rows, so the index adds overhead (as well as skipping reading a bunch of rows).

Perhaps a better approach would be partitioning by date. Then you can construct the query so only a single partition is needed for a year (or month or day) of data. From what you say, I would investigate partitioning; the details depend on your database.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

You used the word "mainly" which suggests that your client may also do other types of queries from time to time.

So my suggestion is to have a composite clustered index on Device ID, then DateTime for more flexible queries and use Google Big Query (or something similar) for overall analysis. e.g. Data for whole of 2013

Robert
  • 486
  • 2
  • 15