-2

I am optimizing a database with almost no knowledge for my bachelor thesis. In no way i want to let you do the work for me, but i have some questions which no one could answer so far.

Table Structure:

data_inc, CREATE TABLE 'data_inc' ( 'id' bigint(20) NOT NULL AUTO_INCREMENT, 'id_para' int(10) unsigned NOT NULL DEFAULT '0', 't_s' int(11) unsigned NOT NULL DEFAULT '0', 't_ms' smallint(6) unsigned NOT NULL DEFAULT '0', 't_ns' bigint(20) unsigned NOT NULL DEFAULT '0', 'id_inst' smallint(6) NOT NULL DEFAULT '1', 'value' varchar(255) NOT NULL DEFAULT '', 'isanchor' tinyint(4) unsigned NOT NULL DEFAULT '0', PRIMARY KEY ('id','t_ns'), KEY 't_s' ('t_s'), KEY 't_ns' ('t_ns') ) ENGINE=MyISAM AUTO_INCREMENT=2128295174 DEFAULT CHARSET=latin1 /* !50100 PARTITION BY RANGE (t_ns) (PARTITION 19_02_2015_23_59 VALUES LESS THAN (1424386799000000000) ENGINE = MyISAM, PARTITION 20_02_2015_23_59 VALUES LESS THAN (1424473199000000000) ENGINE = MyISAM, PARTITION 21_02_2015_23_59 VALUES LESS THAN (1424559599000000000) ENGINE = MyISAM, PARTITION 22_02_2015_23_59 VALUES LESS THAN (1424645999000000000) ENGINE = MyISAM, PARTITION 23_02_2015_23_59 VALUES LESS THAN (1424732399000000000) ENGINE = MyISAM, PARTITION 24_02_2015_23_59 VALUES LESS THAN (1424818799000000000) ENGINE = MyISAM, PARTITION 25_02_2015_23_59 VALUES LESS THAN (1424905199000000000) ENGINE = MyISAM, PARTITION 05_03_2015_23_59 VALUES LESS THAN (1425596399000000000) ENGINE = MyISAM, PARTITION 13_03_2015_23_59 VALUES LESS THAN (1426287599000000000) ENGINE = MyISAM, PARTITION 14_03_2015_23_59 VALUES LESS THAN (1426373999000000000) ENGINE = MyISAM, PARTITION 15_03_2015_23_59 VALUES LESS THAN (1426460399000000000) ENGINE = MyISAM, PARTITION 16_03_2015_23_59 VALUES LESS THAN (1426546799000000000) ENGINE = MyISAM, PARTITION 17_03_2015_23_59 VALUES LESS THAN (1426633199000000000) ENGINE = MyISAM, PARTITION 18_03_2015_23_59 VALUES LESS THAN (1426719599000000000) ENGINE = MyISAM) */

The system is currently logging up to 4000 Parameters per second into a database (differnet tables, which one is decided in stored procedures). Every 5 minutes, 1 hour and daily different scripts are called to analyse the logging data, during that time data is written to the tables. This results in some heavy loads right now. Is there a chance that switching from MyISAM to InnoDB (or others) that the performance improves?

Thanks for your help!

Flo Win
  • 154
  • 10
  • It all depends upon the structure of the data, how you need to use it, and what index types you have. We need more information. – BenM Mar 11 '15 at 14:44
  • `Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow[...].` Beside that if you just insert and read, but and don't do update on the data, which is typical for logging data, then you don't need transactions and you could use MyISAM which in theory be faster as is does not care about that feature. – t.niese Mar 11 '15 at 14:45
  • no way to answer this with lots more details, but...generally speaking, myisam will be faster than innodb because it's such a "limited" engine. no foreign keys, which eliminates a lot of overhead. on the other hand, innodb supports row-level locking, while myisam has only table locks. – Marc B Mar 11 '15 at 14:46
  • http://stackoverflow.com/questions/20148/myisam-versus-innodb – MatejG Mar 11 '15 at 14:49
  • @BenM I added more information, what else do you want to know? I'm happy to provide any infos but i don't really know what you have to know to make decisions – Flo Win Mar 11 '15 at 15:06
  • @Wr1t3r the link you provided does not really help me any further, i have read it before i posted – Flo Win Mar 11 '15 at 15:06

1 Answers1

0

For logging quickly followed by analysis...

  1. Gather the data into a MyISAM table with no indexes. After 5 min (1.2M rows!):
  2. Analyze it into InnoDB "Summary Table(s)".
  3. DROP TABLE or TRUNCATE TABLE.

The analysis would be put into other table(s). These would have summary information and be much smaller than 1.2M rows.

To get hourly data, summarize the summary table(s). But don't create "hourly" tables; simply fetch and recalculate as needed.

Here are some related discussions: High speed ingestion and Summary Tables.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I've just come across your HowTos on friday and i am currently trying to apply it to my DB. Do you know if it's possible (or easy) to add a partition to my db from the staging table? `ALTER TABLE t1 EXCHANGE PARTITION p WITH t2` or something like that? – Flo Win Mar 16 '15 at 08:14
  • That's an interesting "improvement" to my HowTo. But I will have to ponder it at length. Some initial thoughts: (1) that forces the same schema on the staging table; (2) any indexes required by #1 would slow down the inserts; (3) that could (would) lead to a large number of partitions, which has other performance issues; (4) etc?? – Rick James Mar 16 '15 at 20:35
  • 1) it does, however in my case the fact table contains everything that comes in (which isn't optimal either). 2) That should be possible performance-wise. Maybe this can only work if there are very few indexes (1, maybe 2) required for the fact table. 3) I calculated it to a little over 2000 partitions for a 7 day table. Maybe mergetables would be better here? 4) I have a good chunk of spare performance in the system, as i save a lot due to not working on one table anymore (inserts and selects on the same myisam table) – Flo Win Mar 17 '15 at 08:00
  • 2000 partitions would, by itself, be a problem. Performance of `PARTITION` is not so good with more than, say, 50 partitions. There are cases where queries open all the partitions before realizing only one is needed. `MERGE` has had little support in over a decade; I would not go there. – Rick James Mar 17 '15 at 15:25
  • Why have seconds and milliseconds if you also have nanoseconds? (That would get rid of one index) – Rick James Mar 17 '15 at 15:25
  • `EXCHANGE PARTITION` would not work because of the `AUTO_INCREMENT`, correct? – Rick James Mar 17 '15 at 15:28
  • Instead of 4000 rows for 4000 values, how about 40 tables with 100 columns each? It would cut down on the ingestion problems and save a lot of disk space. – Rick James Mar 17 '15 at 15:30
  • The database grew from having t_s to having both t_s and t_ms, therefore the t_ns was introduced. To be backwards-compatible all three columns were kept. I will definitely get rid of the t_s and t_ms column. – Flo Win Mar 17 '15 at 16:33
  • I'm not that deep in the subject yet, but i think `EXCHANGE PARTITION` could work, if I got rid of the `AUTO_INCREMENT` and instead use the t_ns, para_id columns as an primary index (it should be distinct). Does that make sense, or am I on a completely wrong path? – Flo Win Mar 17 '15 at 16:38
  • What do your SELECTs look like? – Rick James Mar 17 '15 at 16:53
  • They fetch rows based on the timestamp `t_ns (IN, BETWEEN and >/<)` or based on the `id_para (IN, =)` column out of both the processing and the huge (fact?) table. Most of the `t_ns` would query more than 20% of the data in a table with 5-10 minutes of data (so a full table scan would be faster according to your RoTs). For the table gathering all data for 7 days, an index would make sense because most of the time the queries would fetch data over several minutes. The `id_para` SELECTs could use an index, as it is very selective. – Flo Win Mar 17 '15 at 17:09
  • Is `id_inst` the "4000"? Yeah, sounds like `id_para` needs an index; keep in mind that all partitions would be probed for a given id_para value if there is nothing else in the `WHERE` clause. `PRIMARY KEY(id_para, t_ns)` (in that order), plus partitioning, would be a good compromise for the SELECTs you mention. – Rick James Mar 17 '15 at 20:48