Design database to store forecast time series

Question

Here is what the current design looks like:

Date_Created	Location	Data_Date	Data
date_time_1	a	date_time_1 + delta	##.##
date_time_1	a	date_time_1 + 2 * delta	##.##
date_time_1	a	...	...
date_time_1	a	date_time_1 + 1344 * delta	##.##
date_time_1	b	date_time_1 + delta	##.##
date_time_1	b	date_time_1 + 2 * delta	##.##
date_time_1	b	...	...
date_time_1	b	date_time_1 + 1344 * delta	##.##
date_time_2	a	date_time_2 + delta	##.##
date_time_2	a	date_time_2 + 2 * delta	##.##
date_time_2	a	...	...
date_time_2	a	date_time_2 + 1344 * delta	##.##
date_time_2	b	date_time_2 + delta	##.##
date_time_2	b	date_time_2 + 2 * delta	##.##
date_time_2	b	...	...
date_time_2	b	date_time_2 + 1344 * delta	##.##

Note

Date_Created has an interval of one day (new data comes in every day).
There are more than just two unique Location (it is 25 currently, and could grow in the future).
The Data_Date for each Location goes from Date_Created + delta to Date_Created + 1344 * delta where delta is 15 minutes.

Since a lot of data is being repeated right now, I am wondering what would be a better way to design the database to store this data. My initial thought was to have a separate table for each location. However, that still doesn't help with the repeated data in the Date_Created column.

I am most familiar with relational databases (MySQL) but open to other suggestions.

This is extremely unclear & apparently misconceived. What does "a lot of data is being repeated right now" mean? What do "repeated data" & "repeated data in the Date_Created column" mean? What is your justification per a design reference that it is a problem? There is no problem "data being repeated" here. There is no problem per se with a value appearing more than once in a column. Where are you 1st stuck in what published presentation of what design method? PS [Understanding Normalization & Duplicates](https://stackoverflow.com/a/44539858/3404097) — philipxy, Jul 02 '22 at 23:39
Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. Reflect research in posts. SO/SE search is poor & literal & sometimes limited to titles, but read the help. Google re googling/searching, including in Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [ask] [Help] — philipxy, Jul 03 '22 at 00:54
If you want to ask whatever you're trying to ask, please edit your post to ask it clearly. Unfortunately questions with bounties cannot be voted closed. PS You might want to SO/SE search answers by me re "duplicat\* data" & "repeat\* data" & google re duplicate & repeated data in DB design with "site:stackoverflow.com". But mostly read some textbooks on information modelling & DB design. Since you're trying to do it. (Dozens are free online in pdf & html. Not all good though. Hence "textbooks" plural.) — philipxy, Jul 03 '22 at 01:48
Does this answer your question? [Understanding Normalization & Duplicates - I Guess I Don't - Adding Artist & Title Ids](https://stackoverflow.com/questions/44530971/understanding-normalization-duplicates-i-guess-i-dont-adding-artist-tit) — philipxy, Jul 10 '22 at 22:03

score 2 · Answer 1 · answered Jul 03 '22 at 12:45

I have concerns about the third column as it's not clear what those values mean.

A suggestion I could make for another database type would be a object based database like firebase from google.

This could look something like this:

"forecast_time_series" : {
    "a" : {
        "id1" : {
            "Date_Created": "date_time_1",
            "Data_Date": "date_time_1 + delta",
            "data": ##.## 
        },
        "id2" : {
            "Date_Created": "date_time_1",
            "Data_Date": "date_time_1 + 2 * delta",
            "data": "##.##"
        },
    }
}

And you could look for time series database types. I have never worked with those I can't help at this point but this type could be the best solution for your problem.

score -1 · Answer 2 · answered Jul 09 '22 at 07:38

I think you should sort your data by location but it may depend on what you want to do with the data. It could be like location as collections and days as documents.

I recommend Firebase Firestore. It is really easy to use and is really good for what you want to do. Also you can check GraphQL, I think it can help you with big documents.

K J · Answer 3 · 2022-07-10T00:38:48.190

Open Question unqualified what "data" contains nor why there are 1344 block entries, so forgive my folder naming as me not understanding that part, However in cases where you want to instantly find a PDF or DOC or whatever with "Fred" or a data file like ##.## by far the fastest and simplest is use a native NTFS DataStore there is nothing more file write instant/versatile or better protected than if used on a Raid Drive, the structure can be quickly changed in just a line or two from the console.

In this example I am searching for pdf object with Fred Blogs (see top right) and looking in the currently sparse tree for a blob named FB.Txt but you will store whatever those numbered var binary blobs are. Your lightweight DB of choice just needs to index the files with reference keywords for other searches / stats.

Design database to store forecast time series

3 Answers3