I was wondering if anyone had some advice for me regarding a histogram-generating query. I have a query that I like (in that it works), but it is extremely slow. Here is the background:
I have a table of metadata, a table of data values where one row in meta_data is a key-row for many (perhaps several thousand) rows in data_values, and a table of histogram bin information:
create table meta_data (
id int not null primary key,
name varchar(100),
other_data char(10)
);
create table data_values (
id int not null primary key,
meta_data_id int not null,
data_value real
);
create table histogram_bins (
id int not null primary key,
bin_min real,
bin_max real,
bin_center real,
bin_size real
);
And a query that creates the histogram:
SELECT md.name AS `Name`,
md.other_data AS `OtherData`,
hist.bin_center AS `Bin`,
SUM(data.data_value BETWEEN hist.bin_min AND hist.bin_max) AS `Frequency`
FROM histogram_bins hist
LEFT JOIN data_values data ON 1 = 1
LEFT JOIN meta_data md ON md.id = data.meta_data_id
GROUP BY md.id, `Bin`;
In an earlier version of this query, the BETWEEN ... AND logical statement was down in the JOIN (replacing 1 = 1), but then I would only receive histogram rows with non-zero frequency. I need rows for all of the bins (even the zero-frequency ones), for analysis purposes.
Its pretty darn slow, to the tune of 10-15 minutes or so. The data_values table has about 7.9 million rows, and meta_data weighs in at 15,900 rows -- so maybe it is just going to take a long time!
Thanks very much!