I am trying to take a table where strings are arranged in records that are clusters of three strings and return a query where the data is in one record.
The table definition is:
CREATE TABLE Xerox.FIM2 (
KPIN CHAR(18) NOT NULL COMMENT 'Part Number',
KSEQ DECIMAL(2, 0) NOT NULL COMMENT 'Sequence Number',
SDS1 CHAR(40) NOT NULL COMMENT 'Supplemental Description 1',
SDS2 CHAR(30) NOT NULL COMMENT 'Supplemental Description 2',
SDS3 CHAR(30) NOT NULL COMMENT 'Supplemental Description 3',
PRIMARY KEY (KPIN, KSEQ),
INDEX IDX_FIM2_KSEQ (KSEQ),
INDEX IDX_FIM2_SDS1 (SDS1),
INDEX IDX_FIM2_SDS2 (SDS2),
INDEX IDX_FIM2_SDS3 (SDS3),
INDEX UK_FIM2_KPIN (KPIN)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 180
CHARACTER SET latin1
COLLATE latin1_swedish_ci
COMMENT = 'Supplemental Part Descriptions';
A given part number may have no records or up to 8 records. These eight records contain string data (descriptions, three per record) that I am trying to transform from:
KPIN KSEQ SDS1 SDS2 SDS3
R0205 1 COLD ROLLED A1008/A CS TYPE B MATTE FINISH OIL / EXPOSED MIN YIELD 30,000 RB 45-60
R0205 2 THICKNESS .032 TOL +.006/-.000 WIDTH 48.000 TOL +.188/-.000 LENGTH 84.875 TOL +/-.020
R0205 3 SQUARENESS TOL +/-.062 MAX LIFT WEIGHT 10,000 LBS <null>
to data like:
KPIN KSEQ DES1 DES2 DES3 DES4 DES5 DES6 DES7 DES8 DES9 DES10 DES11 DES12
R0205 1 COLD ROLLED A1008/A CS TYPE B MATTE FINISH OIL / EXPOSED MIN YIELD 30,000 RB 45-60 THICKNESS .032 TOL +.006/-.000 WIDTH 48.000 TOL +.188/-.000 LENGTH 84.875 TOL +/-.020 THICKNESS .032 TOL +.006/-.000 WIDTH 48.000 TOL +.188/-.000 LENGTH 84.875 TOL +/-.020 SQUARENESS TOL +/-.062 MAX LIFT WEIGHT 10,000 LBS <null><null><null><null>...
my SQL command that does this is:
SELECT FIM1.KPIN, FIM1.DES, p.DES1, p.DES2, p.DES3, p.DES4, p.DES5, p.DES6,
p.DES7, p.DES8, p.DES9, p.DES10, p.DES11, p.DES12,
p.DES13, p.DES14, p.DES15, p.DES16, p.DES17, p.DES18,
p.DES19, p.DES20, p.DES21, p.DES22, p.DES23, p.DES24
FROM FIM1
JOIN
(SELECT KPIN, KSEQ,
MAX(CASE WHEN (KSEQ = 1) THEN SDS1 END) AS DES1,
MAX(CASE WHEN (KSEQ = 1) THEN SDS2 END) AS DES2,
MAX(CASE WHEN (KSEQ = 1) THEN SDS3 END) AS DES3,
MAX(CASE WHEN (KSEQ = 2) THEN SDS1 END) AS DES4,
MAX(CASE WHEN (KSEQ = 2) THEN SDS2 END) AS DES5,
MAX(CASE WHEN (KSEQ = 2) THEN SDS3 END) AS DES6,
MAX(CASE WHEN (KSEQ = 3) THEN SDS1 END) AS DES7,
MAX(CASE WHEN (KSEQ = 3) THEN SDS2 END) AS DES8,
MAX(CASE WHEN (KSEQ = 3) THEN SDS3 END) AS DES9,
MAX(CASE WHEN (KSEQ = 4) THEN SDS1 END) AS DES10,
MAX(CASE WHEN (KSEQ = 4) THEN SDS2 END) AS DES11,
MAX(CASE WHEN (KSEQ = 4) THEN SDS3 END) AS DES12,
MAX(CASE WHEN (KSEQ = 5) THEN SDS1 END) AS DES13,
MAX(CASE WHEN (KSEQ = 5) THEN SDS2 END) AS DES14,
MAX(CASE WHEN (KSEQ = 5) THEN SDS3 END) AS DES15,
MAX(CASE WHEN (KSEQ = 6) THEN SDS1 END) AS DES16,
MAX(CASE WHEN (KSEQ = 6) THEN SDS2 END) AS DES17,
MAX(CASE WHEN (KSEQ = 6) THEN SDS3 END) AS DES18,
MAX(CASE WHEN (KSEQ = 7) THEN SDS1 END) AS DES19,
MAX(CASE WHEN (KSEQ = 7) THEN SDS2 END) AS DES20,
MAX(CASE WHEN (KSEQ = 7) THEN SDS3 END) AS DES21,
MAX(CASE WHEN (KSEQ = 8) THEN SDS1 END) AS DES22,
MAX(CASE WHEN (KSEQ = 8) THEN SDS2 END) AS DES23,
MAX(CASE WHEN (KSEQ = 8) THEN SDS3 END) AS DES24
FROM FIM2
GROUP BY KPIN
ORDER BY KPIN, KSEQ) as p
ON FIM1.KPIN=p.KPIN ORDER BY FIM1.KPIN
There are 118,552 records in table FIM1 (the main part number list)
There are 66,303 records in table FIM2 (all these descriptions)
62,163 of the records in table FIM1 have at least one description record in FIM2.
The query as written above returns 62,163 records in about 3 seconds. If I a change this to a LEFT JOIN instead of a JOIN, I would expect to get all 118,552 records with lots of nulls. That operation takes minutes and minutes to complete. I want to understand why there is such a huge difference between those two operations. It seems that the LEFT JOIN is forcing the engine to traverse the FIM2 table again and again, but then why does it not have to do that for the simple JOIN? I realize I am creating something like an temporary record for each item, but the simple join was still able to optimize it.
This row consolidation technique is based on concepts that are detailed here: How to pivot? How to convert multiple rows into one row with multiple columns?
My query plans: (for the JOIN):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL (null) (null) (null) (null) 64512 Using temporary; Using filesort
1 PRIMARY FIM1 eq_ref PRIMARY,UK_FIM1_KPIN PRIMARY 18 p.KPIN 1
2 DERIVED FIM2 index (null) PRIMARY 19 (null) 64085 Using temporary; Using filesort
for the LEFT JOIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY FIM1 index (null) IDX_FIM1_DES 30 (null) 124199 Using index; Using temporary; Using filesort
1 PRIMARY <derived2> ALL (null) (null) (null) (null) 64512
2 DERIVED FIM2 index (null) PRIMARY 19 (null) 64085 Using temporary; Using filesort