What should I use for a Database?

Question

My vb.net code calculates the growth rate of a company's stock price for every quarter from 1901 to present and stores it in a datatable. This takes a while to perform (10-15 minutes). I would like to save the information in the datatable after it is calculated so that I don't have to recalculate past growth rates every time I run the program. When I open my program I want the datatable to contain any growth rates that have already been calculated so I only have to calculate growth rates for new quarters.

Should I store my datatable in a database of some kind or is there another way to do this? My datatable is quite large. It currently has 450 columns (one for each quarter from 1901 to present) and can have thousands of rows (one for each company). Is this too big for Microsoft Access? Would Microsoft Excel be an option?

Thanks!

Excel (2007 & newer) will accommodate over 1 million rows, and over 16,000 columns. Access would probably be a better solution for a database. Alternatively, since you're just storing data, you could write out to a txt delimited file and just parse that file for your calculations. — David Zemens, Feb 20 '13 at 17:24
@DavidZemens Zemens I'm intrigued by the txt delimited file. I've thought about this before but I don't really know how to go about it. — gromit1, Feb 20 '13 at 17:31
@TarandeepGill Not only do people use Access but Access 2013 is shortly due for release. It is still hard to match for a RAD tool. — Fionnuala, Feb 20 '13 at 18:03
It depends on what you want to use the data for... If you're just looking to store it, a delimited text file would be perfect (you can do a search on SO for outputting a Datatable to delimited text file - It's really easy). If you want to do queries / calculations off of it, I would suggest a DB like MySQL (free and very powerful). I would suggest, though, as others have said, to standardize the tables so you don't have to add in a new column every quarter - Bad DB design principle!! — John Bustos, Feb 20 '13 at 18:14
For now I outputted my Datatable to a delimited text file and it is working. I will look into creating an appropriate Database soon. Thanks for your help everyone! — gromit1, Feb 22 '13 at 19:25

score 4 · Answer 1 · answered Feb 20 '13 at 18:22

First of all, it's unclear you actually need a database. If you don't need things such as concurrent access, client/server operation, ACID transactions etc... you might as well just implement your cache using the file system.

If you conclude you do need a DBMS, there are many good choices, including free such as: PostgreSQL, MS SQL Server Express, Oracle Express, MySQL, Firebird, SQLite etc... or commercial such as: Oracle, MS SQL Server, IBM DB2, Sybase etc...

I suggest you make your data model flexible, so you don't have to add new column for each new quarter:

enter image description here

This model is also well suited for clustering (if your DBMS of choice supports it), so the calculations belonging to the same company are stored physically close together in the database, potentially lowering the I/O during querying. Alternatively, you may choose to cluster on year/quarter.

score 3 · Answer 2 · edited Feb 20 '13 at 19:57

3

I would change the database design to:

ID
Quarter
Year
CompanyName
Value1
Value2
Value3

as your columns and start saving it as a vertical table.

Then, you don't have as much data as you think, so I'd recommend something free like mysql, or even nosql, since you're not doing anything but storing and retrieving the data. Any text based file: xml, csv, .xls that you use is going to be way slower because the entire file needs to get loaded into memory for you to be able to parse it.

edited Feb 20 '13 at 19:57

Steven Doggart

43,358
8
68
105

answered Feb 20 '13 at 17:34

RandomUs1r

4,010
1
24
44

Can't you read text files line-by-line in memory? Not necessary to open the file entirely or hold the entire file in memory, I think. – David Zemens Feb 20 '13 at 17:38
1

with csv's ya, you can use a buffered reader, xml works via the nodes/xpath, xls no chance, but what if your data is at the very end of the file? Murphy's law. Databases provide indexes as a solution. – RandomUs1r Feb 20 '13 at 17:42
@Syn123 I've looked into your database design and I believe I now have an adequate design but I'm not stuck trying to load the data into access from the internet. My problem is at http://stackoverflow.com/questions/15160781/how-to-load-csv-files-from-the-internet-into-an-access-database. Any thoughts would be greatly appreciated! – gromit1 Mar 01 '13 at 15:17

score 2 · Answer 3 · answered Feb 20 '13 at 17:21

Excel has a limit in regards to sizes of the sheets, and you shouldn't really ever use it as an explicit "database" for anything wish you wish to port over to different structures. It's good for things like spreadsheets and accounting in general, but you shouldn't use it for an absolute-truth database as is understood in computing. Also, Excel has a limit on the number of records that can be contained: Worksheet size 65,536 rows by 256 columns as of 2003

Access may work for this, but with the number of records you're looking at, you'll probably begin to experience issues with file sizes, slowdowns, and just general things like that. In situations when you start having more than 3,000 records at a time, it's probably better to use one of the big RDBMs or something like that; Oracle, MySQL, SQL Server, etc.

>104,000 rows and >16,000 columns in Excel 2007 and 2010. Excel 2003 is 10 year old software :) But otherwise, you're right, it is not a great substitute for database. — David Zemens, Feb 20 '13 at 17:37

score 1 · Answer 4 · answered Feb 20 '13 at 17:26

1

I think that the main problem might be the way you designed the database. A column for each quarter doesn't sound very good practice, specially when you have to change your DB schema every new quarter.

You could start with a MS Access database and then if you have any performance problems with it, migrate to a SQL Server database or something.

Again, I think that you should take a carefull look at your database design.

answered Feb 20 '13 at 17:26

Ricardo

11
1

Do you have any suggestions on a better database design? I'm really new to this and this is the best I could come up with. Knowing what the growth rate is for each quarter is essential though. Thanks! – gromit1 Feb 20 '13 at 17:30
I've looked into database design and I believe I now have an adequate design but I'm not stuck trying to load the data into access from the internet. My problem is at http://stackoverflow.com/questions/15160781/how-to-load-csv-files-from-the-internet-into-an-access-database. Any thoughts would be greatly appreciated! – gromit1 Mar 01 '13 at 15:16

MonkeyDoug · Answer 5 · 2013-02-20T17:52:13.073

1

I have a great deal of experience with stock data. Having tested quite a few methods, I think for a simple free method you should try SQL Server. The amount of data you are working with is just too much for Access (I imagine this is not the only calculations you would like). You can use SQL Server Express for free.

For this design I would create a database within SQL Server named HistoricalGrowthRate. I would have a table for each stock symbol and store the data in there.

One way to accomplish this is to have a separate database with a table that contains all the symbols you wish to follow (if you don't have can use the CompanyList.csv from Nasdaq). Loop through each symbol in that table and run a create table in HistoricalGrowthRate. When you wish to populate the values, simply loop again and insert your values. You could also just export from Access, which ever is faster for you.

This will decrease the load when you call for the information and provide an easy way to access the info. So, if you want the historical growth rate for AAPL, you simply set the connection string to your HistoricalGrowthRate database, refrence table AAPL and the extract the values.

edited Feb 20 '13 at 17:52

answered Feb 20 '13 at 17:39

MonkeyDoug

453
3
17

I'm very interested! It sounds like I am working on a a lot of similar things! I am looking to creating an appropriate database design. I am extremely interested in the CompanyList.csv from Nasdaq that you mentioned. How exactly would I download that? – gromit1 Feb 22 '13 at 19:27
1

They can be found at www.nasdaq.com/screening/company-list.aspx (bottom of page). Also there is a post with code to get 65k or so symbol using Maas's Yahoo finance API (http://stackoverflow.com/questions/5246843/how-to-get-a-complete-list-of-ticker-symbols-from-yahoo-finance), check out jm1102's response. I had to modify it slightly, but works well. – MonkeyDoug Feb 22 '13 at 21:58
1

About stock database design: (This is what I like, it isn't the only way to do it ofcourse) For things that I reference a lot, I like having them in separate databases by concept. I might have one for key calcs like growth rate, PE, or Price/Book. I might have another for historical prices and one for intraday. I keep them together so they exist in a place that makes sense to me, and I often don't only need one calc, I need many at the same time. – MonkeyDoug Feb 22 '13 at 22:27

What should I use for a Database?

5 Answers5