-1

When working with data values, should I create a single table storing the hourly values, and also the aggregated daily/monthly values, or should I create separate tables for these?

I'd imagine multiple tables would be the way to go, but I'm a complete amateur here. It sounds like something that would improve performance and possibly maintenance, but I'd also like to know if this even makes a difference. In the end, having 3-4 tables vs 1 could also cause some maintenance issues I would imagine.

So basically, a values_table containing:

id     value    datetime                 range
1      33       2022-05-13 11:00:00      hourly
2      54       2022-05-13 12:00:00      hourly
3      840      2022-05-13               daily
...

vs

hourly_values_table containing:

id     value    datetime
1      33       2022-05-13 11:00:00
2      54       2022-05-13 12:00:00
...

And a daily_values_table containing:

id     value    datetime
1      840      2022-05-13
...

What would be the proper way to handle this?

philipxy
  • 14,867
  • 6
  • 39
  • 83
dansan
  • 194
  • 2
  • 2
  • 15
  • 1
    Just store your data with a full precision timestamp, then generate the reports by day, hour, or minute, as needed. – Tim Biegeleisen May 13 '22 at 10:42
  • I have the impression that there's a lot of reluctancy out there to use relational databases for what they're designed to do (aggregate information split in tables). What SQL query and indexes do you plan to use to calculate aggregated values if you pack everything in a single table? – Álvaro González May 13 '22 at 10:50
  • 1
    To add on what @TimBiegeleisen said, with either of your approaches you're basically going for denormalisation by storing "redundant" data. That's what can potentially cause maintenance issues. It _can_ be an option, but only for performance reasons. At least, don't mix "cached" data with raw data. – Álvaro González May 13 '22 at 10:55
  • @TimBiegeleisen My concern with this was performance. Let's say I'm working with millions of values, in such a case, storing aggregated data would be preferred, no? – dansan May 13 '22 at 11:06
  • It depends. _Derived_ data should generally not be stored long term for the very reason that it is derived from another table. Therefore, when the data in that other table changes, your derived data immediately becomes stale. Millions of values in a properly indexed table is no problem. – Tim Biegeleisen May 13 '22 at 11:07
  • @ÁlvaroGonzález In my current case, hourly is the raw data. So based on your last comment, storing the cached daily/monthly/whatever should be done in a separate table? Storing the aggregated data is only for performance. – dansan May 13 '22 at 11:09
  • @dansan Yes, store in one or more separate tables. But keep in mind that you will have to update these summary tables every time the original table has any DML done to it (or else the aggregate tables will become stale). – Tim Biegeleisen May 13 '22 at 11:12
  • @TimBiegeleisen That makes a lot of sense. I'm dealing with currency rates here, even if my question was worded slightly differently. 3 APIs, one returns hourly values, the others return daily. This was to minimize issues with API downtime, so 2 of them act more as backups. I wanted to store the data from all 3 APIs, instead of storing a single value. However, since one is hourly and two are daily, I found it harder to create a single table for it. EDIT: and I do aggregate the data from the hourly into daily values, but maybe I shouldn't? – dansan May 13 '22 at 11:27
  • This is a faq. Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. PS A "basically" or "essentially" or "in other words" that doesn't introduce or summarize a clear, precise & full description that you also give just means "unclearly" or "it is false that". – philipxy May 13 '22 at 16:23
  • Does this answer your question? [How can you represent inheritance in a database?](https://stackoverflow.com/questions/3579079/how-can-you-represent-inheritance-in-a-database) – philipxy Jun 25 '22 at 09:59

2 Answers2

0

Your hourly data is a Data Warehouse 'Fact' table". It is, I assume, written 'continually' and never updated.

"Summary Table(s)" are useful for performance. Usually only 1 is needed. For example a "daily" table gives you about a 24x reduction. From that table you can fetch weekly, monthly, or any arbitrary date range reasonably efficiently. (I need more metrics and a better feel for what type of data you are storing to be surer of what I am saying.)

I discuss using MySQL for DW and Summary tables

Sure, purists debate the storing of "redundant" data. But when you get a billion rows, you really need summary tables to avoid performace bottlenecks.

As for how long to hold onto the data in the Fact table or the Summary table, I often suggest:

  • Use Partitioning for speedy of purging old data (after, say, a month), thereby saving disk space;
  • Keep the summary tables 'forever', since they are 'small'.
Rick James
  • 135,179
  • 13
  • 127
  • 222
-1

I don't understand your purpose or your approach?

You have to start with the purpose of the database? What data are you trying to store, and why?

From reading your description I can't tell if the data is supposed to be connected to a person, or is it for an accounting purpose? There's no context.

Start with the purpose of the database and this will identify the tables/names, which will then reveal the structure and relationships. And go to my post here for clarification, which could help conceptually. Link

Rex
  • 357
  • 3
  • 5