Best way to store Daily stock price data?

Question

I am using sqlite3 and python.

My issue is wrapping my head around how the table or tables should be organized to store stock price information.

Basically each day I will run a script that fetches price data for a a list of stock tickers. I want to be able to record data for each day so I can recall it later.

I idea was to have a new table for each date, then have a row for each stock ticker and then be able to store the data in the relevant columns but this seems a bit inefficient.

Another idea was to have each row a new ticker with the relevant info as before in the columns but include the date as well. Then when I run the script on the next day it just adds new rows same as the day before to the same table and I can use an sql command to get the info using WHERE or something.

Am I thinking about this incorrect?

That sounds like a start, you might want to create a project to start experimenting and find out how it went. As it is now what you posted is less of a question but more a collection of random thoughts on how you might tackle the problem, and it makes it difficult for anyone to even attempt to answer (What is [on-topic for StackOverflow](https://stackoverflow.com/help/on-topic)). You may wish to search around and you might find questions like [Database schema for organizing historical stock data](https://stackoverflow.com/questions/1523576/), which may be a good starting reference. — metatoaster, Mar 14 '21 at 07:47
Your first idea is a terrible one. Your second idea seems more like it, but why not read more about normalisation? — Strawberry, Mar 14 '21 at 08:45

score 0 · Answer 1 · answered Mar 14 '21 at 12:07

idea was to have a new table for each date

Individual tables could be complicated, say there is a need for a month's worth of data, what would be done? Each table will take up a minimum of 4k and could perhaps result in 2k unused disk space per table.

Another idea was to have each row a new ticker with the relevant info as before in the columns but include the date as well.

This is closer BUT what is relevant data. I not at all clued up on tickers, stocks etc. But if other relevant data is repeated e.g. say ACME being the code for the stock, then repeating that data time and time again can be a waste of space and inefficient.

Here's an example, intended as a pointer rather than a solution to use, that explains using tables with relationships that can reduce duplicated data (normalise the data to some extent).

For this simple example you'd likely have a table for the stock and it's static/rarely changed but often used(referred to) data (such as the company name).

You'd then likely have a table for the activity (that would include the date) that references (has a relation ship with) the more static data.

Say ACME changed it's code, with your idea you'd have to change the name in every row where ACME was recorded (not hard but inefficient) in comparison to the single change needed if 2 related tables were used.

The following is an example of SQL that creates, populates and queries (extracts) the tables as discussed.

/* Deletes the 2 tables just in case they exist and allows the example to be rerun */
DROP TABLE IF EXISTS stock;
DROP TABLE IF EXISTS activity;
/* Create the stock table with 3 columns
    1st is a unique identifier that will be used for reference the stock
    2nd is for the stock code
    3rd is for the company name
*/
CREATE TABLE IF NOT EXISTS stock (  
    id INTEGER PRIMARY KEY,
    code TEXT,
    company TEXT
    )
;
/*
    Create the activity table with 4 columns
    id not really needed BUT exists anyway (search for rowid to find out more)
    the date and time the activity occured with a default value that will be the current timestamp (special)
    the id value of the id column in the stock row to which the activity relates (the relationship)
    the amount (change) made
*/
CREATE TABLE IF NOT EXISTS activity (
    id INTEGER PRIMARY KEY,
    activity_date INTEGER DEFAULT CURRENT_TIMESTAMP,
    related_stock INTEGER,
    activity_amount INTEGER
    )
;
/*
    Add 3 stock rows
*/
INSERT INTO stock (id,code,company) 
    VALUES
        (1,'ACME','ACME COMPANY')
        ,(2,'IBM','International Business Machines')
        ,(100,'MSFT','Microsoft')
;
/*
    add some activity for the stocks (see results)
*/
INSERT INTO activity
    (related_stock,activity_amount)
    VALUES
        (1,+100),(1,-10),(1,+8)
        ,(2,75),(100,60),(1,-5),(2,+10),(100,-4),(100,+16)
;

/* Get the results of the activities */
SELECT 
    stock.code,
    stock.company, 
    sum(activity.activity_amount) AS end_amount, /* user the aggregate function to sum the amounts in the group */
    group_concat(activity.activity_amount,' ') AS activities /* use the aggregate function to get all the values separated by a space */
FROM stock /* parent table */
JOIN activity ON activity.related_stock = stock.id /* related table and how to relate it */
GROUP BY /* make groups according to :- */
    stock.id
;

When the above is run (in a third party SQLite tool (NaviCat SQLite used))

The Query (SELECT .... ) produces :-

Best way to store Daily stock price data?

1 Answers1