Needed help in optimizing order by and count query, I have tables having millions (approx 3 millions) rows.
I have to join 4 tables and fetch the records, When i run the simple query it takes only millisecond to complete but as I try to count or order by having left join table it get stuck for unlimited of time.
Please see the cases below.
DB Server Configuration:
CPU Number of virtual cores: 4
Memory(RAM): 16 GiB
Network Performance: High
Rows in each table:
tbl_customers - #Rows: 20 million.
tbl_customers_address - #Row 25 million.
tbl_shop_setting - #Rows 50k
aio_customer_tracking - #Rows 5k
Tables Schema:
CREATE TABLE `tbl_customers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(225) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`accepts_marketing` TINYINT(1) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_name` VARCHAR(50) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`last_order_id` BIGINT(20) NULL DEFAULT NULL,
`total_spent` DECIMAL(12,2) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL COLLATE 'latin1_swedish_ci',
`verified_email` TINYINT(4) NULL DEFAULT NULL,
`updated_at` DATETIME NULL DEFAULT NULL,
`created_at` DATETIME NULL DEFAULT NULL,
`date_updated` DATETIME NULL DEFAULT NULL,
`date_created` DATETIME NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE INDEX `shopify_customer_id_unique` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `shop_id` (`shop_id`)
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_customers_address` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`customer_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_address_id` BIGINT(20) NULL DEFAULT NULL,
`shopify_customer_id` BIGINT(20) NULL DEFAULT NULL,
`first_name` VARCHAR(50) NULL DEFAULT NULL,
`last_name` VARCHAR(50) NULL DEFAULT NULL,
`company` VARCHAR(50) NULL DEFAULT NULL,
`address1` VARCHAR(250) NULL DEFAULT NULL,
`address2` VARCHAR(250) NULL DEFAULT NULL,
`city` VARCHAR(50) NULL DEFAULT NULL,
`province` VARCHAR(50) NULL DEFAULT NULL,
`country` VARCHAR(50) NULL DEFAULT NULL,
`zip` VARCHAR(15) NULL DEFAULT NULL,
`phone` VARCHAR(20) NULL DEFAULT NULL,
`name` VARCHAR(50) NULL DEFAULT NULL,
`province_code` VARCHAR(5) NULL DEFAULT NULL,
`country_code` VARCHAR(5) NULL DEFAULT NULL,
`country_name` VARCHAR(50) NULL DEFAULT NULL,
`longitude` VARCHAR(250) NULL DEFAULT NULL,
`latitude` VARCHAR(250) NULL DEFAULT NULL,
`default` TINYINT(1) NULL DEFAULT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `customer_id` (`customer_id`),
INDEX `shopify_address_id` (`shopify_address_id`),
INDEX `shopify_customer_id` (`shopify_customer_id`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
CREATE TABLE `tbl_shop_setting` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`shop_name` VARCHAR(300) NOT NULL COLLATE 'latin1_swedish_ci',
PRIMARY KEY (`id`),
)
COLLATE='utf8mb4_general_ci'
ENGINE=InnoDB;
CREATE TABLE `aio_customer_tracking` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`shopify_customer_id` BIGINT(20) UNSIGNED NOT NULL,
`email` VARCHAR(255) NULL DEFAULT NULL,
`shop_id` BIGINT(20) UNSIGNED NOT NULL,
`domain` VARCHAR(255) NULL DEFAULT NULL,
`web_session_count` INT(11) NOT NULL,
`last_seen_date` DATETIME NULL DEFAULT NULL,
`last_contact_date` DATETIME NULL DEFAULT NULL,
`last_email_open` DATETIME NULL DEFAULT NULL,
`created_date` DATETIME NOT NULL,
`is_geo_fetched` TINYINT(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
INDEX `shopify_customer_id` (`shopify_customer_id`),
INDEX `email` (`email`),
INDEX `shopify_customer_id_shop_id` (`shopify_customer_id`, `shop_id`),
INDEX `last_seen_date` (`last_seen_date`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB;
Query Cases Running and Not Running:
1. Running: Below query fetch the records by joining all the 4 tables, It takes only 0.300 ms.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
2. Not running: Simply when try to get the count of these row stuk the query, I waited 10 min but still running.
SELECT
COUNT(DISTINCT c.shopify_customer_id) -- what makes #2 different
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
LIMIT 20
3. Not running: In the #1 query we simply put the 1 Order by clause and it get stuck, I waited 10 min but still running. I study query optimization some article and tried by indexing, Right Join etc.. but still not working.
SELECT `c`.first_name,`c`.last_name,`c`.email, `t`.`last_seen_date`, `t`.`last_contact_date`, `ssh`.`shop_name`, ca.`company`, ca.`address1`, ca.`address2`, ca.`city`, ca.`province`, ca.`country`, ca.`zip`, ca.`province_code`, ca.`country_code`
FROM `tbl_customers` AS `c`
JOIN `tbl_shop_setting` AS `ssh` ON c.shop_id = ssh.id
LEFT JOIN (SELECT shopify_customer_id, last_seen_date, last_contact_date FROM aio_customer_tracking GROUP BY shopify_customer_id) as t ON t.shopify_customer_id = c.shopify_customer_id
LEFT JOIN `tbl_customers_address` as ca ON (c.shopify_customer_id = ca.shopify_customer_id AND ca.default = 1)
GROUP BY c.shopify_customer_id
ORDER BY `t`.`last_seen_date` -- what makes #3 different
LIMIT 20
Any suggestion to optimize the query, table structure are welcome.
WHAT I'M TRYING TO DO:
tbl_customers
table contains the customer info, tbl_customer_address
table contains the addresses of the customers(one customer may have multiple address), And aio_customer_tracking
table contains visiting records of the customer last_seen_date
is the visiting date.
Now, simply I want to fetch and count the customers, with their one of the address, and visiting info. Also, I may order by any of the column from these 3 tables, In my example i am ordering by last_seen_date (the default order). Hope this explanation helps to understand what i am trying to do.