0

I am using SQLite in my application. The scenario is that I have stock market data and each company is a database with 1 table. That table stores records which can range from couple thousand to half a million.

Currently when I update the data in real time I - open connection, check if that particular data exists or not. If not, I then insert it and close the connection. This is then done in a loop and each database (representing a company) is updated. The number of records inserted is low and is not the problem. But is the process okay?

An alternate way is to have 1 database with many tables (each company can be a table) and each table can have a lot of records. Is this better or not?

You can expect at around 500 companies. I am coding in VS 2010. The language is VB.NET.

soundslikeodd
  • 1,078
  • 3
  • 19
  • 32
Greatchap
  • 382
  • 8
  • 21
  • 1
    I would do some reading on normalization of database and database layout. I think you over complicating it. One DB, one table for records that has a foreign key for the company table. Otherwise this question is still to vague. – OneFineDay Jan 18 '17 at 15:46
  • a) rather than 2 queries use an UPSERT (perhaps an sp) to Insert or Update as needed b) use transactions. Thats a lot of data for SQLite though, I might consider MySQL for that (plus the db Design elements @OneFineDay mentioned - you just need an identifier to indicate the company not a whole new table.) – Ňɏssa Pøngjǣrdenlarp Jan 18 '17 at 15:48
  • I am not using transaction because only 1 record is being inserted every x minutes. The problem is loads of databases are being updated. The max records would be around 3 hundred thousand and not half a million as I mentioned earlier. The file size is less than 20 MB for the db which has 3 hundred thousand records. Should I stick to SQLite or go to mysql. I dont think UPSERT will do the job here. I just want to insert record if it does not exist. If it exists then do nothing. – Greatchap Jan 18 '17 at 16:18
  • 1
    For inserting, you could ease your logic with something like this: http://stackoverflow.com/questions/19337029/insert-if-not-exists-statement-in-sqlite – MPelletier Jan 18 '17 at 16:35
  • thank you @MPelletier for the Insert or Ignore solution. – Greatchap Jan 18 '17 at 17:32
  • `Insert or Ignore` == UPSERT, the data providers are able to store changes allowing you to batch them and reduce the number of times you have to go to the DB. – Ňɏssa Pøngjǣrdenlarp Jan 18 '17 at 18:37

2 Answers2

0

I did something similar, with similar sized data in another field. It depends a lot on your indexes. Ultimately, separating each large table was best (1 table per file, representing a cohesive unit, in you case one company). Plus you gain the advantage of each company table being the same name, versus having x tables of different names that have the same scheme (and no sanitizing of company names to make new tables required).

Internally, other DBMSs often keep at least one file per table in their internal structure, SQL is thus just a layer of abstraction above that. SQLite (despite its conceptors' boasting) is meant for small projects and querying larger data models will get more finicky in order to make it work well.

MPelletier
  • 16,256
  • 15
  • 86
  • 137
  • I think SQLite automatically creates index on primary key. And I have only 1 primary key so should I create an index myself or its not required. Is MySQL for windows faster than SQLite. – Greatchap Jan 18 '17 at 16:21
  • Only make the indexes you need. That primary key most definitely has an index, so if you don't need more, don't add more. And yes, MySQL will be faster. It's more setup though, and don't forget regular backups (in either case, come to think of it). – MPelletier Jan 18 '17 at 16:33
  • I think I will stick to SQLite as I want to build a windows application with a small footprint. I meant installation should be simple and easy. And SQLite is easy to install and use. In-fact just two Dlls are needed with application. – Greatchap Jan 18 '17 at 16:56
0

The optimal organization for your data is to make it properly normalized, i.e., put all data into a single table with a company column. This is better for performance because the table- and database-related overhead is reduced.

Queries can be sped up with indexes, but what indexes you need depends on the actual queries.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • I don't think this is a good idea because if I put all records into one table with company column then there would be millions of records. Assume one company will have over a hundred thousand records and I have data for 500 companies then it becomes 50 million records (at least). That is too much data for that table. – Greatchap Jan 18 '17 at 17:32
  • Why would it be too much data? What exactly do you think would change? Have you actually measured it, and got results different from other people? – CL. Jan 18 '17 at 18:19
  • I did not measure anything but dont you think that performing operations such as search or insert or update etc will take a little longer as database has to deal with so many record. Moreover someone mentioned that SQLite is meant for small databases and if records number increase substantially then use MySql which I do no want to use. – Greatchap Jan 19 '17 at 04:37
  • SQLite's [limit](http://www.sqlite.org/limits.html#max_page_count) is 140 TB. Operations on B-trees do not get slower with the same rate as their size (it's actually O(log n)), and both SQLite and MySQL require indexes to do searches on large tables efficiently. A feature that only MySQL has is the ability to partition a database over multiple disks, but you are far away from that. – CL. Jan 19 '17 at 07:56
  • So what you are suggesting is that I have one database with 1 table which stores all the records for stocks. It does not matter even if the record reaches more than fifty million. As per you this will yield better performance than have multiple databases. – Greatchap Jan 19 '17 at 09:00
  • The difference is probably not noticeable. But it will save the programmer the effort of handling all the tables. – CL. Jan 19 '17 at 10:38
  • If performance matters then I can try otherwise its risky because if one database file gets corrupt its ok (as each company is a db) but if this master database gets corrupt then everything is gone. – Greatchap Jan 19 '17 at 11:07
  • What? Searching millions of records would be as fast *if not faster than* searching a subset of thousands? Am I reading this right? It will be different: searching the subset will be faster. The overhead of swapping database files can easily be wrapped and forgotten about. – MPelletier Jan 19 '17 at 22:06