4

I am trying to take a table where strings are arranged in records that are clusters of three strings and return a query where the data is in one record.

The table definition is:

CREATE TABLE Xerox.FIM2 (
  KPIN CHAR(18) NOT NULL COMMENT 'Part Number',
  KSEQ DECIMAL(2, 0) NOT NULL COMMENT 'Sequence Number',
  SDS1 CHAR(40) NOT NULL COMMENT 'Supplemental Description 1',
  SDS2 CHAR(30) NOT NULL COMMENT 'Supplemental Description 2',
  SDS3 CHAR(30) NOT NULL COMMENT 'Supplemental Description 3',
  PRIMARY KEY (KPIN, KSEQ),
  INDEX IDX_FIM2_KSEQ (KSEQ),
  INDEX IDX_FIM2_SDS1 (SDS1),
  INDEX IDX_FIM2_SDS2 (SDS2),
  INDEX IDX_FIM2_SDS3 (SDS3),
  INDEX UK_FIM2_KPIN (KPIN)
)
ENGINE = INNODB
AVG_ROW_LENGTH = 180
CHARACTER SET latin1
COLLATE latin1_swedish_ci
COMMENT = 'Supplemental Part Descriptions';

A given part number may have no records or up to 8 records. These eight records contain string data (descriptions, three per record) that I am trying to transform from:

KPIN  KSEQ  SDS1                                SDS2                            SDS3

R0205   1   COLD ROLLED A1008/A CS TYPE B       MATTE FINISH OIL / EXPOSED      MIN YIELD 30,000 RB 45-60
R0205   2   THICKNESS .032 TOL +.006/-.000      WIDTH 48.000 TOL +.188/-.000    LENGTH 84.875 TOL +/-.020
R0205   3   SQUARENESS TOL +/-.062              MAX LIFT WEIGHT 10,000 LBS      <null>

to data like:

KPIN  KSEQ  DES1                                DES2                            DES3                        DES4                                DES5                            DES6                        DES7                                DES8                            DES9                        DES10                               DES11                           DES12

R0205   1   COLD ROLLED A1008/A CS TYPE B       MATTE FINISH OIL / EXPOSED      MIN YIELD 30,000 RB 45-60   THICKNESS .032 TOL +.006/-.000      WIDTH 48.000 TOL +.188/-.000    LENGTH 84.875 TOL +/-.020   THICKNESS .032 TOL +.006/-.000      WIDTH 48.000 TOL +.188/-.000    LENGTH 84.875 TOL +/-.020   SQUARENESS TOL +/-.062              MAX LIFT WEIGHT 10,000 LBS      <null><null><null><null>...

my SQL command that does this is:

    SELECT FIM1.KPIN, FIM1.DES, p.DES1, p.DES2, p.DES3, p.DES4, p.DES5, p.DES6,
                                p.DES7, p.DES8, p.DES9, p.DES10, p.DES11, p.DES12,
                                p.DES13, p.DES14, p.DES15, p.DES16, p.DES17, p.DES18,
                                p.DES19, p.DES20, p.DES21, p.DES22, p.DES23, p.DES24
    FROM FIM1
    JOIN
    (SELECT  KPIN, KSEQ,
      MAX(CASE WHEN (KSEQ = 1) THEN SDS1 END) AS DES1,
      MAX(CASE WHEN (KSEQ = 1) THEN SDS2 END) AS DES2,
      MAX(CASE WHEN (KSEQ = 1) THEN SDS3 END) AS DES3,
      MAX(CASE WHEN (KSEQ = 2) THEN SDS1 END) AS DES4,
      MAX(CASE WHEN (KSEQ = 2) THEN SDS2 END) AS DES5,
      MAX(CASE WHEN (KSEQ = 2) THEN SDS3 END) AS DES6,
      MAX(CASE WHEN (KSEQ = 3) THEN SDS1 END) AS DES7,
      MAX(CASE WHEN (KSEQ = 3) THEN SDS2 END) AS DES8,
      MAX(CASE WHEN (KSEQ = 3) THEN SDS3 END) AS DES9,
      MAX(CASE WHEN (KSEQ = 4) THEN SDS1 END) AS DES10,
      MAX(CASE WHEN (KSEQ = 4) THEN SDS2 END) AS DES11,
      MAX(CASE WHEN (KSEQ = 4) THEN SDS3 END) AS DES12,
      MAX(CASE WHEN (KSEQ = 5) THEN SDS1 END) AS DES13,
      MAX(CASE WHEN (KSEQ = 5) THEN SDS2 END) AS DES14,
      MAX(CASE WHEN (KSEQ = 5) THEN SDS3 END) AS DES15,
      MAX(CASE WHEN (KSEQ = 6) THEN SDS1 END) AS DES16,
      MAX(CASE WHEN (KSEQ = 6) THEN SDS2 END) AS DES17,
      MAX(CASE WHEN (KSEQ = 6) THEN SDS3 END) AS DES18,
      MAX(CASE WHEN (KSEQ = 7) THEN SDS1 END) AS DES19,
      MAX(CASE WHEN (KSEQ = 7) THEN SDS2 END) AS DES20,
      MAX(CASE WHEN (KSEQ = 7) THEN SDS3 END) AS DES21,
      MAX(CASE WHEN (KSEQ = 8) THEN SDS1 END) AS DES22,
      MAX(CASE WHEN (KSEQ = 8) THEN SDS2 END) AS DES23,
      MAX(CASE WHEN (KSEQ = 8) THEN SDS3 END) AS DES24

    FROM FIM2
    GROUP BY KPIN
    ORDER BY KPIN, KSEQ) as p
    ON FIM1.KPIN=p.KPIN ORDER BY FIM1.KPIN

There are 118,552 records in table FIM1 (the main part number list)

There are 66,303 records in table FIM2 (all these descriptions)

62,163 of the records in table FIM1 have at least one description record in FIM2.

The query as written above returns 62,163 records in about 3 seconds. If I a change this to a LEFT JOIN instead of a JOIN, I would expect to get all 118,552 records with lots of nulls. That operation takes minutes and minutes to complete. I want to understand why there is such a huge difference between those two operations. It seems that the LEFT JOIN is forcing the engine to traverse the FIM2 table again and again, but then why does it not have to do that for the simple JOIN? I realize I am creating something like an temporary record for each item, but the simple join was still able to optimize it.

This row consolidation technique is based on concepts that are detailed here: How to pivot? How to convert multiple rows into one row with multiple columns?

My query plans: (for the JOIN):

id  select_type table       type        possible_keys           key         key_len ref     rows    Extra
1   PRIMARY     <derived2>  ALL         (null)                  (null)      (null)  (null)  64512   Using temporary; Using filesort
1   PRIMARY     FIM1        eq_ref      PRIMARY,UK_FIM1_KPIN    PRIMARY     18      p.KPIN  1   
2   DERIVED     FIM2        index       (null)                  PRIMARY     19      (null)  64085   Using temporary; Using filesort

for the LEFT JOIN:

id  select_type table       type        possible_keys   key             key_len     ref     rows    Extra
1   PRIMARY     FIM1        index       (null)          IDX_FIM1_DES    30          (null)  124199  Using index; Using temporary; Using filesort
1   PRIMARY     <derived2>  ALL         (null)          (null)          (null)      (null)  64512   
2   DERIVED     FIM2        index       (null)          PRIMARY         19          (null)  64085   Using temporary; Using filesort
Community
  • 1
  • 1
Mike
  • 1,274
  • 10
  • 24
  • Just out of morbid curiosity, if you remove the order by do you get your results when using the left join faster? (remove the inner order by first) then try (I really don't think you want that order by there in the first place). then remove the outer. and try again. You only need the outer results ordered the inner is likely adding overhead. In addition I would add a group by KSEQ to the inner select or you could get an unanticipated kseq value. – xQbert Dec 05 '14 at 17:04
  • @xQbert I did try your suggestion, but even with no ORDER BYs it still ground on and on. – Mike Dec 05 '14 at 19:04

1 Answers1

3

Ah, the joys of denormalized data sets. It's not what you asked, but could you redo this FIM2 table like this, so there's just one item in each row rather than three? Your life would get easier if you did this.

KPIN  KSEQ  KFLD  ATTRIBUTE
R0205   1      1   COLD ROLLED A1008/A CS TYPE B
R0205   1      2   MATTE FINISH OIL / EXPOSED
R0205   1      3   MIN YIELD 30,000 RB 45-60
R0205   2      1   THICKNESS .032 TOL +.006/-.000
R0205   2      2   WIDTH 48.000 TOL +.188/-.000
R0205   2      3   LENGTH 84.875 TOL +/-.020
R0205   3      1   SQUARENESS TOL +/-.062
R0205   3      2   MAX LIFT WEIGHT 10,000 LBS      

But, that's not what you asked. You want to improve your report program so it can LEFT JOIN these two tables in a reasonable time.

It's important to break down your query. First, you're trying to LEFT JOIN the FIM1 and FIM2 tables, and then pivot the result set to expand stuff to one line.

Your approach, functionally correct, was to do the pivoting first, then the joining. Let's try another approach, first joining, then pivoting. That should reduce the number crunching required to satisfy the query.

     SELECT ff.KPIN, ff.DES,
            a.SDS1 AS DES1,
            a.SDS2 AS DES2,
            a.SDS3 AS DES3,
            b.SDS1 AS DES4,
            b.SDS2 AS DES5,
            b.SDS3 AS DES6,
            c.SDS1 AS DES7,
            c.SDS2 AS DES8, ...
            h.SDS1 AS DES23,
            h.SDS2 AS DES24
       FROM FIM1 AS ff
  LEFT JOIN FIM2 AS a ON ff.KPIN = a.KPIN AND a.KSEQ=1
  LEFT JOIN FIM2 AS b ON ff.KPIN = b.KPIN AND b.KSEQ=2
  LEFT JOIN FIM2 AS c ON ff.KPIN = c.KPIN AND c.KSEQ=3
  LEFT JOIN FIM2 AS d ON ff.KPIN = d.KPIN AND d.KSEQ=4
  LEFT JOIN FIM2 AS e ON ff.KPIN = e.KPIN AND e.KSEQ=5
  LEFT JOIN FIM2 AS f ON ff.KPIN = f.KPIN AND f.KSEQ=6
  LEFT JOIN FIM2 AS g ON ff.KPIN = g.KPIN AND g.KSEQ=7
  LEFT JOIN FIM2 AS h ON ff.KPIN = h.KPIN AND h.KSEQ=8

If you index FIM2 with the compound index (KSEQ, KPIN) this will probably be quite fast. It does admittedly look hairy with the eight LEFT JOIN operations, but your MAX() ... GROUP BY query is also hairy. You should also try the index (KSEQ,KPIN) to see which is faster.

Mike
  • 1,274
  • 10
  • 24
O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Ah, the successful transmission of sarcasm over the internet. ;-) – Strawberry Dec 05 '14 at 17:54
  • Mild irony, brother, mild irony. Sarcasm is reserved for people who screw up medical records. :-) – O. Jones Dec 05 '14 at 17:57
  • Excellent answer changing the order of events in a way I would not have thought of. Really sped things up. Few small things I had to correct: a.KSEQ=1 a.KSEQ=2 ... should be a.KSEQ=1 b.KSEQ=2 c.KSEQ=3 ... Also f is not a unique alias so I changed FIM1 as f to FIM1 as ff. I will update your answer then accept it if you do not object. – Mike Dec 05 '14 at 18:19
  • Oh, darn, sorry for the bug in the query. You are right about `AS ff` and the aliases; thanks for the correction. – O. Jones Dec 05 '14 at 18:32