I have been writing the results from an algorithm that calculates distances between customers in a InnoDB table. For example if my customers were A, B, C and D, the table in the database looks like this, among other columns:
From | To | Distance
A B 344
A C 274
A D 182
B C 338
And so on... It is a lot of rows I think I will hit 50 million.
The other columns are product_type and value. Those tell me how much the customer B (customer_to in the columns) buys of that product_type. That means that I have each pair multiple times depending on how many product_types the customer B buys.
I needed to to a query to group each customer with the products his neighbors buys and the value. The query looks like this:
select customer_from, product_type, avg(value) as opportunity
from customer_distances
where distance < 500
group by customer_from, product_type
order by opportunity desc;
The innodb table could not answer me that query. Despite I changed the net_read_timeout to 28800, the mysql connection was lost during the query.
I tough it had something to do with innodb build for transactional processing and not for intensive queries. So I created a new table with MyIsam as engine and insert-select all the records from the innodb table.
As expected, the select was very fast (70 segs) and all other selects like count( distinct customer_from), where almost instantaneous.
Just for curiosity I tried to continue the process of inserting the distances in the myisam table. It was a surprise for me when the program started to run at least 100 times faster than when it was working on the innodb table -for INSERTS!
For each customer the program inserts something like 3000 rows (one for each neighbor for each product_type. Something like 300 neighbors and 10 product_types per customer). With the innodb table inserting a single customer took something between 40 and 60 seconds (aprox. 3000 rows). With the myisam table, it takes 1 second to insert 3 customers (9000 rows aprox).
Some extra information:
- The mysql database is in my PC (localhost).
- The program written in java and is running from my pc.
- I'm using prepared statements and I only change the data between each row and the next. This is related to this question Why is myisam storage engine is faster than Innodb storage engine
So in summary the question is: Why is MyISAM that fast with insert statements? What do you think?
EDIT 1: I'm adding the create statements for both tables, the innodb and myisam. EDIT 2: I deleted some unuseful information and formated a little bit here and there.
/* INNODB TABLE */
CREATE TABLE `customer_distances` (
`customer_from` varchar(50) NOT NULL,
`customer_from_type` varchar(50) DEFAULT NULL,
`customer_from_segment` varchar(50) DEFAULT NULL,
`customer_from_district` int(11) DEFAULT NULL,
`customer_from_zone` int(11) DEFAULT NULL,
`customer_from_longitud` decimal(15,6) DEFAULT NULL,
`customer_from_latitud` decimal(15,6) DEFAULT NULL,
`customer_to` varchar(50) NOT NULL,
`customer_to_type` varchar(50) DEFAULT NULL,
`customer_to_segment` varchar(50) DEFAULT NULL,
`customer_to_district` int(11) DEFAULT NULL,
`customer_to_zone` int(11) DEFAULT NULL,
`customer_to_longitud` decimal(15,6) DEFAULT NULL,
`customer_to_latitud` decimal(15,6) DEFAULT NULL,
`distance` decimal(10,2) DEFAULT NULL,
`product_business_line` varchar(50) DEFAULT NULL,
`product_type` varchar(50) NOT NULL,
`customer_from_liters` decimal(10,2) DEFAULT NULL,
`customer_from_dollars` decimal(10,2) DEFAULT NULL,
`customer_from_units` decimal(10,2) DEFAULT NULL,
`customer_to_liters` decimal(10,2) DEFAULT NULL,
`customer_to_dollars` decimal(10,2) DEFAULT NULL,
`customer_to_units` decimal(10,2) DEFAULT NULL,
`liters_opportunity` decimal(10,2) DEFAULT NULL,
`dollars_opportunity` decimal(10,2) DEFAULT NULL,
`units_oportunity` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* MYISAM TABLE */
CREATE TABLE `customer_distances` (
`customer_from` varchar(50) NOT NULL,
`customer_from_type` varchar(50) DEFAULT NULL,
`customer_from_segment` varchar(50) DEFAULT NULL,
`customer_from_district` int(11) DEFAULT NULL,
`customer_from_zone` int(11) DEFAULT NULL,
`customer_from_longitud` decimal(15,6) DEFAULT NULL,
`customer_from_latitud` decimal(15,6) DEFAULT NULL,
`customer_to` varchar(50) NOT NULL,
`customer_to_type` varchar(50) DEFAULT NULL,
`customer_to_segment` varchar(50) DEFAULT NULL,
`customer_to_district` int(11) DEFAULT NULL,
`customer_to_zone` int(11) DEFAULT NULL,
`customer_to_longitud` decimal(15,6) DEFAULT NULL,
`customer_to_latitud` decimal(15,6) DEFAULT NULL,
`distance` decimal(10,2) DEFAULT NULL,
`product_business_line` varchar(50) DEFAULT NULL,
`product_type` varchar(50) NOT NULL,
`customer_from_liters` decimal(10,2) DEFAULT NULL,
`customer_from_dollars` decimal(10,2) DEFAULT NULL,
`customer_from_units` decimal(10,2) DEFAULT NULL,
`customer_to_liters` decimal(10,2) DEFAULT NULL,
`customer_to_dollars` decimal(10,2) DEFAULT NULL,
`customer_to_units` decimal(10,2) DEFAULT NULL,
`liters_opportunity` decimal(10,2) DEFAULT NULL,
`dollars_opportunity` decimal(10,2) DEFAULT NULL,
`units_oportunity` decimal(10,2) DEFAULT NULL,
PRIMARY KEY (`cliente_desde`,`cliente_hasta`,`grupo`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;