I am trying to select one property per state from a table with 1,000,000 properties. I am trying
select * from properties
where latitude is not null and longitude is not null
group by property_state;
But the query takes 3 seconds. I have an index on latitude and longitude, and an index on state. I tried adding a third index on all 3 columns, but that did not help. Any ideas?
Here is the create table code, if that helps (I removed new index that did not help)
CREATE TABLE `t_national_comps` (
`deal_Id` INT(11) NULL DEFAULT NULL,
`nc_id` INT(11) NOT NULL AUTO_INCREMENT,
`property_id` INT(15) NULL DEFAULT NULL,
`reonomy_property_id` VARCHAR(50) NULL DEFAULT NULL,
`reonomy_url` VARCHAR(80) NULL DEFAULT NULL,
`confidence` FLOAT NULL DEFAULT NULL,
`latitude` DECIMAL(11,8) NULL DEFAULT NULL,
`longitude` DECIMAL(11,8) NULL DEFAULT NULL,
`prop_key` VARCHAR(255) NULL DEFAULT NULL,
`fmt_address` VARCHAR(255) NULL DEFAULT NULL,
`property_street_number` VARCHAR(20) NULL DEFAULT NULL,
`property_street_name` VARCHAR(40) NULL DEFAULT NULL,
`property_street_mode` VARCHAR(20) NULL DEFAULT NULL,
`property_city` VARCHAR(40) NULL DEFAULT NULL,
`property_state` VARCHAR(10) NULL DEFAULT NULL,
`property_zip` VARCHAR(10) NULL DEFAULT NULL,
`property_zip4` VARCHAR(10) NULL DEFAULT NULL,
`municipality` VARCHAR(40) NULL DEFAULT NULL,
`property_class_id` VARCHAR(15) NULL DEFAULT NULL,
`std_land_use_code` VARCHAR(15) NULL DEFAULT NULL,
`sale_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_doc_num` VARCHAR(30) NULL DEFAULT NULL,
`mortgage_date` DATE NULL DEFAULT NULL,
`lender` VARCHAR(100) NULL DEFAULT NULL,
`bank_id` INT(11) NULL DEFAULT NULL,
`loan_amount` BIGINT(15) NULL DEFAULT NULL,
`maturity_date` DATE NULL DEFAULT NULL,
`rate` VARCHAR(20) NULL DEFAULT NULL,
`sale_date` DATE NULL DEFAULT NULL,
`curr_sale_contract_date` DATE NULL DEFAULT NULL,
`curr_sale_document_type` VARCHAR(20) NULL DEFAULT NULL,
`sale_price` BIGINT(22) NULL DEFAULT NULL,
`curr_sale_buyer1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_buyer2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`reported_owner` VARCHAR(60) NULL DEFAULT NULL,
`mailing_address` VARCHAR(500) NULL DEFAULT NULL,
`curr_sale_seller1_full_name` VARCHAR(60) NULL DEFAULT NULL,
`curr_sale_seller2_full_name` VARCHAR(60) NULL DEFAULT NULL,
`sq_footage` VARCHAR(10) NULL DEFAULT NULL,
`resi_units` VARCHAR(10) NULL DEFAULT NULL,
`commercial_units` VARCHAR(10) NULL DEFAULT NULL,
`num_floors` VARCHAR(10) NULL DEFAULT NULL,
`num_buildings` VARCHAR(10) NULL DEFAULT NULL,
`price_per_sq_ft` INT(11) NULL DEFAULT NULL,
`price_per_unit` INT(11) NULL DEFAULT NULL,
`property_type_id` INT(11) NULL DEFAULT NULL,
`property_type` VARCHAR(60) NULL DEFAULT NULL,
`long_lat_point` POINT NULL DEFAULT NULL,
PRIMARY KEY (`nc_id`),
INDEX `t_national_comps_latitude_longitude_index` (`latitude`, `longitude`),
INDEX `t_national_comps_property_city_index` (`property_city`),
INDEX `t_national_comps_property_state_index` (`property_state`),
INDEX `t_national_comps_sale_date_index` (`sale_date`),
INDEX `t_national_comps_point_index` (`long_lat_point`(25)),
INDEX `t_national_comps_reonomy_id_index` (`reonomy_property_id`),
INDEX `mailing_address_index` (`mailing_address`),
INDEX `mortgage_date_index` (`mortgage_date`),
INDEX `t_national_comps_lender_index` (`lender`),
INDEX `bank_id_index` (`bank_id`),
INDEX `street_num_and_zip` (`property_street_number`, `property_zip`)
);
EDIT
The reason I did not aggregate anything in the query, is because I have nothing to aggregate. I know that is not the primary use of group by, but it is commonly used as such, just to get one of each record.
I was able to speed up the query by forcing the use of an index on all 3 columns, like
select latitude, longitude, property_street_number, property_street_name,
property_city, property_state, property_zip from properties
USE INDEX (lat_long_state_index)
where latitude is not null and longitude is not null
group by property_state;
but I am still looking for more optimization.
Thanks to all for your help.