Optimizing Group by performance for EAV databsae model

Question

I am using the below query and it is currently taking 0.7 sec. But when a performance test was run with 200 concurrent users the query was taking more than 60 sec. I have defined required composite indexes but still group by is not considering the index.

We are using RDS MYSQL service (16 GB RAM 8 vCPU)

Query

SELECT 
    patient.voided AS voided, COUNT(*) AS count FROM patient patient
        INNER JOIN person person ON patient.patient_id = person.person_id
        INNER JOIN person_attribute attr ON patient.patient_id = attr.person_id
        INNER JOIN person_attribute_type attr_type ON attr.person_attribute_type_id = attr_type.person_attribute_type_id
WHERE    
    attr.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842'
        AND attr_type.name = 'LocationAttribute'
GROUP BY patient.voided;

Execution Plan

Is it possible to improve the query performance without changing query and by adding indexes or any db optimizations? Because the query was generated by hibernate framework and any code changes requires testing effort.

Here are the definitions of the tables in question from OpenMRS 1.6.

CREATE TABLE IF NOT EXISTS `openmrs`.`person_attribute` (
  `person_attribute_id` INT(11) NOT NULL AUTO_INCREMENT,
  `person_id` INT(11) NOT NULL DEFAULT '0',
  `value` VARCHAR(50) NOT NULL DEFAULT '',
  `person_attribute_type_id` INT(11) NOT NULL DEFAULT '0',
  `creator` INT(11) NOT NULL DEFAULT '0',
  `date_created` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `changed_by` INT(11) NULL DEFAULT NULL,
  `date_changed` DATETIME NULL DEFAULT NULL,
  `voided` SMALLINT(6) NOT NULL DEFAULT '0',
  `voided_by` INT(11) NULL DEFAULT NULL,
  `date_voided` DATETIME NULL DEFAULT NULL,
  `void_reason` VARCHAR(255) NULL DEFAULT NULL,
  `uuid` CHAR(38) NOT NULL,
  PRIMARY KEY (`person_attribute_id`),
  UNIQUE INDEX `person_attribute_uuid_index` (`uuid` ASC) VISIBLE,
  INDEX `identifies_person_idx` (`person_id` ASC) VISIBLE,
  INDEX `defines_attribute_type_idx` (`person_attribute_type_id` ASC) VISIBLE,
  INDEX `attribute_creator` (`creator` ASC) VISIBLE,
  INDEX `attribute_changer` (`changed_by` ASC) VISIBLE,
  INDEX `attribute_voider` (`voided_by` ASC) VISIBLE,
  CONSTRAINT `defines_attribute_type`
    FOREIGN KEY (`person_attribute_type_id`)
    REFERENCES `openmrs`.`person_attribute_type` (`person_attribute_type_id`),
  CONSTRAINT `identifies_person`
    FOREIGN KEY (`person_id`)
    REFERENCES `openmrs`.`person` (`person_id`))
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

CREATE TABLE IF NOT EXISTS `openmrs`.`person_attribute_type` (
  `person_attribute_type_id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(50) NOT NULL DEFAULT '',
  `description` TEXT NOT NULL,
  `format` VARCHAR(50) NULL DEFAULT NULL,
  `foreign_key` INT(11) NULL DEFAULT NULL,
  `searchable` SMALLINT(6) NOT NULL DEFAULT '0',
  `creator` INT(11) NOT NULL DEFAULT '0',
  `date_created` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
  `changed_by` INT(11) NULL DEFAULT NULL,
  `date_changed` DATETIME NULL DEFAULT NULL,
  `retired` SMALLINT(6) NOT NULL DEFAULT '0',
  `retired_by` INT(11) NULL DEFAULT NULL,
  `date_retired` DATETIME NULL DEFAULT NULL,
  `retire_reason` VARCHAR(255) NULL DEFAULT NULL,
  `edit_privilege` VARCHAR(255) NULL DEFAULT NULL,
  `uuid` CHAR(38) NOT NULL,
  `sort_weight` DOUBLE NULL DEFAULT NULL,
  PRIMARY KEY (`person_attribute_type_id`),
  UNIQUE INDEX `person_attribute_type_uuid_index` (`uuid` ASC) VISIBLE,
  INDEX `name_of_attribute` (`name` ASC) VISIBLE,
  INDEX `type_creator` (`creator` ASC) VISIBLE,
  INDEX `attribute_type_changer` (`changed_by` ASC) VISIBLE,
  INDEX `attribute_is_searchable` (`searchable` ASC) VISIBLE,
  INDEX `user_who_retired_person_attribute_type` (`retired_by` ASC) VISIBLE,
  INDEX `person_attribute_type_retired_status` (`retired` ASC) VISIBLE,
  INDEX `privilege_which_can_edit` (`edit_privilege` ASC) VISIBLE)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8;

When asking [tag:query-optimization] questions, it helps if you make a [dbfiddle](https://dbfiddle.uk/) so we can see exactly how you defined your tables. You might think you have the right indexes, but perhaps there's another that can be suggested — that's why you asked this question after all. Help us to help you! — Bill Karwin, Mar 10 '22 at 16:07
I do see in your execution plan a high value for `rows` for the `attr` table, which probably is a lot of the high cost of the query. This suggests a new index that matches the conditions more precisely is needed. — Bill Karwin, Mar 10 '22 at 16:09
The OpenMRS schema is [here](https://wiki.openmrs.org/display/docs/Data+Model?preview=/589829/393221/Openmrs_data_model_1.6.svg). — O. Jones, Mar 11 '22 at 18:11

score 1 · Answer 1 · answered Mar 10 '22 at 20:12

Over-normalization.

You need to restructure the schema to avoid running off to other tables as much. Note that the WHERE requires tests on two tables. There is virtually no way to optimize that except by moving the name and value into the same table. For that matter, the person_id should be in that table, too.

If the framework is generating the schema as well as the query, then it should be relegated to 'toy' projects, not even medium-sized datasets like yours.

I've added a tag that will link to other discussions of EAV. Even after cleaning up these tables, EAV groans when given millions of rows.

These composite indexes may help:

attr:  INDEX(value, person_id,  person_attribute_type_id)
attr_type:  INDEX(name, person_attribute_type_id)

But, if value is TEXT, that first one cannot be added to the table.

For further discussion, please provide SHOW CREATE TABLE for each table.

Judging from the application's schema, I agree with Rick's index suggestions. ``Another widely deployed open-source system with tons of single-column indexes but not carefully designed indexes. WTF? WTF? `` WordPress is the other one. — O. Jones, Mar 11 '22 at 18:17

Optimizing Group by performance for EAV databsae model

1 Answers1