0

I am trying to select one property per state from a table with 1,000,000 properties. I am trying

select * from properties
where latitude is not null and longitude is not null
group by property_state;

But the query takes 3 seconds. I have an index on latitude and longitude, and an index on state. I tried adding a third index on all 3 columns, but that did not help. Any ideas?

Here is the create table code, if that helps (I removed new index that did not help)

CREATE TABLE `t_national_comps` (
`deal_Id` INT(11) NULL DEFAULT NULL,
`nc_id` INT(11) NOT NULL AUTO_INCREMENT,
`property_id` INT(15) NULL DEFAULT NULL,
`reonomy_property_id` VARCHAR(50) NULL DEFAULT NULL,
`reonomy_url` VARCHAR(80) NULL DEFAULT NULL,
`confidence` FLOAT NULL DEFAULT NULL,
`latitude` DECIMAL(11,8) NULL DEFAULT NULL,
`longitude` DECIMAL(11,8) NULL DEFAULT NULL,
`prop_key` VARCHAR(255) NULL DEFAULT NULL,
`fmt_address` VARCHAR(255) NULL DEFAULT NULL,
`property_street_number` VARCHAR(20) NULL DEFAULT NULL,
`property_street_name` VARCHAR(40) NULL DEFAULT NULL,
`property_street_mode` VARCHAR(20) NULL DEFAULT NULL,
`property_city` VARCHAR(40) NULL DEFAULT NULL,
`property_state` VARCHAR(10) NULL DEFAULT NULL,
`property_zip` VARCHAR(10) NULL DEFAULT NULL,
`property_zip4` VARCHAR(10) NULL DEFAULT NULL,
`municipality` VARCHAR(40) NULL DEFAULT NULL,
`property_class_id` VARCHAR(15) NULL DEFAULT NULL,
`std_land_use_code` VARCHAR(15) NULL DEFAULT NULL,
`sale_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_date` DATE NULL DEFAULT NULL,
`lender` VARCHAR(100) NULL DEFAULT NULL,
`bank_id` INT(11) NULL DEFAULT NULL,
`loan_amount` BIGINT(15) NULL DEFAULT NULL,
`maturity_date` DATE NULL DEFAULT NULL,
`rate` VARCHAR(20) NULL DEFAULT NULL,
`sale_date` DATE NULL DEFAULT NULL,
`curr_sale_contract_date` DATE NULL DEFAULT NULL,
`curr_sale_document_type` VARCHAR(20) NULL DEFAULT NULL,
`sale_price` BIGINT(22) NULL DEFAULT NULL,
`curr_sale_buyer1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_buyer2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`reported_owner` VARCHAR(60) NULL DEFAULT NULL,
`mailing_address` VARCHAR(500) NULL DEFAULT NULL,
`curr_sale_seller1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_seller2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`sq_footage` VARCHAR(10) NULL DEFAULT NULL,
`resi_units` VARCHAR(10) NULL DEFAULT NULL,
`commercial_units` VARCHAR(10) NULL DEFAULT NULL,
`num_floors` VARCHAR(10) NULL DEFAULT NULL,
`num_buildings` VARCHAR(10) NULL DEFAULT NULL,
`price_per_sq_ft` INT(11) NULL DEFAULT NULL,
`price_per_unit` INT(11) NULL DEFAULT NULL,
`property_type_id` INT(11) NULL DEFAULT NULL,
`property_type` VARCHAR(60) NULL DEFAULT NULL,
`long_lat_point` POINT NULL DEFAULT NULL,
PRIMARY KEY (`nc_id`),
INDEX `t_national_comps_latitude_longitude_index` (`latitude`, `longitude`),
INDEX `t_national_comps_property_city_index` (`property_city`),
INDEX `t_national_comps_property_state_index` (`property_state`),
INDEX `t_national_comps_sale_date_index` (`sale_date`),
INDEX `t_national_comps_point_index` (`long_lat_point`(25)),
INDEX `t_national_comps_reonomy_id_index` (`reonomy_property_id`),
INDEX `mailing_address_index` (`mailing_address`),
INDEX `mortgage_date_index` (`mortgage_date`),
INDEX `t_national_comps_lender_index` (`lender`),
INDEX `bank_id_index` (`bank_id`),
INDEX `street_num_and_zip` (`property_street_number`, `property_zip`)
);

EDIT
The reason I did not aggregate anything in the query, is because I have nothing to aggregate. I know that is not the primary use of group by, but it is commonly used as such, just to get one of each record.

I was able to speed up the query by forcing the use of an index on all 3 columns, like

select latitude, longitude, property_street_number, property_street_name, 
property_city, property_state, property_zip from properties
USE INDEX (lat_long_state_index)
where latitude is not null and longitude is not null
group by property_state;

but I am still looking for more optimization.
Thanks to all for your help.

arecaps
  • 155
  • 1
  • 10
  • 2
    Have You tried running `EXPLAIN` on the query? – Roman Hocke Jul 18 '18 at 14:45
  • 2
    What is the purpose of the `GROUP BY`, when You do not aggregate anything in the `SELECT`? What columns does Your table have and what result do You want to get from the query? – Roman Hocke Jul 18 '18 at 14:47
  • @Roman Hocke Explain tells me it is using the state index, and scanning 907,000 rows. I just need any property from each state, I don't care which, as long as it has a longitude and latitude. – arecaps Jul 18 '18 at 14:52
  • 2
    @arecaps if that's what you want I'd suggest this question. https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group – sertsedat Jul 18 '18 at 14:56
  • @Bill Karwin not sure if that tag applies, I dont need the overhead of getting specifically the greatest, any if good for me. – arecaps Jul 18 '18 at 15:10
  • Use proper `GROUP BY`. – Eric Jul 18 '18 at 15:55
  • 2
    What's the purpose of `GROUP BY` here??? Have you Googled what `GROUP BY` is for??? – Eric Jul 18 '18 at 15:57

2 Answers2

1

Group By

I am not convinced Group By should be used in such a way, although internally MySQL could be smart enough ( I am not sure) to use a Distinct when it sees a group by with no aggregation, but i don't think it's a correct way to use Group By.

Index

MySQL uses one index per table per query and it will only pick one, so before you have the three columns, it's correct to pick the index with property_state because MySQL generally won't use index for not equal condition.

You can do a EXPLAIN compare the query before and after force index. MySQL optimizer believe the single column index is better.

Too many indexes will also add overhead to insert. After you have the three columns index, you can actually delete the property_state index because it's covered by the three columns index(leftmost). Your future query will surely use the new index you created.

Jacob
  • 1,776
  • 14
  • 11
0

My suggestion is do not use select *. Instead of using select * use select id, .... This will definitely reduce your execution time.