-1

I'm new to SQL and i was trying to make this query better since it's very slow.

I have a SQL Server table called Measurement2 with columns testPointKey(foreign key for parent table of type int auto-incrementing), dataType (dataType is of type TEXT), and dataValue (of type float). I'm trying to select data from this table based on multiple dataType values, and I'd like to have the results in a single row with each dataType value as a separate column. Here is create for the tables(note not the full schema):

CREATE TABLE TestPoint2 (
    recordKey           INTEGER     NOT NULL PRIMARY KEY,
    testSessionKey      INTEGER     NOT NULL REFERENCES TestSession2(recordKey) ON DELETE CASCADE,
    voltage             FLOAT,
    amps                FLOAT,
    phase               FLOAT,
    seconds             INTEGER,
    iterations          INTEGER,
);

CREATE TABLE Measurement2 (
    id                  INTEGER     NOT NULL PRIMARY KEY,
    testPointKey        INTEGER     NOT NULL REFERENCES TestPoint2(recordKey) ON DELETE CASCADE,
    mIndex              INTEGER,
    dataType            TEXT,
    dataValue           FLOAT,
);

Here's an example of the desired output:

hp tempHp keithley tempKeithley va vc vavc ia ic phaseVc phaseIa phaseIc tempVaa tempVb tempVc tempIa tempIb tempIc tempCta tempCtb tempCtc tempSom
120 30 10 30 50 40 90 20 30 60 20 10 20 30 40 50 60 70 80 90 100 110

I've tried using multiple instances of Measurement2 and joining them on testPointKey, like this:

This returns the correct output but it's very slow, and it would get worse the more different data types i have.

SELECT t1.dataValue AS hp, t2.dataValue AS tempHp, t3.dataValue AS keithley, t4.dataValue AS tempKeithley, t5.dataValue AS va, t6.dataValue AS vc, t7.dataValue AS vavc, t8.dataValue AS ia, t9.dataValue AS ic, t10.dataValue AS phaseVc, t11.dataValue AS phaseIa, t12.dataValue AS phaseIc, t13.dataValue AS tempVaa, t14.dataValue AS tempVb, t15.dataValue AS tempVc, t16.dataValue AS tempIa, t17.dataValue AS tempIb, t18.dataValue AS tempIc, t19.dataValue AS tempCta, t20.dataValue AS tempCtb, t21.dataValue AS tempCtc, t22.dataValue AS tempSom
FROM Measurement2 t1 
INNER JOIN Measurement2 t2 ON t1.testPointKey = t2.testPointKey 
INNER JOIN Measurement2 t3 ON t1.testPointKey = t3.testPointKey 
INNER JOIN Measurement2 t4 ON t1.testPointKey = t4.testPointKey 
INNER JOIN Measurement2 t5 ON t1.testPointKey = t5.testPointKey 
INNER JOIN Measurement2 t6 ON t1.testPointKey = t6.testPointKey 
INNER JOIN Measurement2 t7 ON t1.testPointKey = t7.testPointKey 
INNER JOIN Measurement2 t8 ON t1.testPointKey = t8.testPointKey 
INNER JOIN Measurement2 t9 ON t1.testPointKey = t9.testPointKey 
INNER JOIN Measurement2 t10 ON t1.testPointKey = t10.testPointKey 
INNER JOIN Measurement2 t11 ON t1.testPointKey = t11.testPointKey 
INNER JOIN Measurement2 t12 ON t1.testPointKey = t12.testPointKey 
INNER JOIN Measurement2 t13 ON t1.testPointKey = t13.testPointKey 
INNER JOIN Measurement2 t14 ON t1.testPointKey = t14.testPointKey 
INNER JOIN Measurement2 t15 ON t1.testPointKey = t15.testPointKey 
INNER JOIN Measurement2 t16 ON t1.testPointKey = t16.testPointKey 
INNER JOIN Measurement2 t17 ON t1.testPointKey = t17.testPointKey 
INNER JOIN Measurement2 t18 ON t1.testPointKey = t18.testPointKey 
INNER JOIN Measurement2 t19 ON t1.testPointKey = t19.testPointKey 
INNER JOIN Measurement2 t20 ON t1.testPointKey = t20.testPointKey 
INNER JOIN Measurement2 t21 ON t1.testPointKey = t21.testPointKey 
INNER JOIN Measurement2 t22 ON t1.testPointKey = t22.testPointKey 
INNER JOIN TestPoint2 tp1 ON tp1.recordKey = t22.testPointKey
WHERE CAST(t1.dataType as VARCHAR) = 'hp' 
  AND CAST(t2.dataType as VARCHAR) = 'tempHp'
  AND CAST(t3.dataType as VARCHAR) = 'keithley' 
  AND CAST(t4.dataType as VARCHAR) = 'tempKeithley' 
  AND CAST(t5.dataType as VARCHAR) = 'va' 
  AND CAST(t6.dataType as VARCHAR) = 'vc' 
  AND CAST(t7.dataType as VARCHAR) = 'vavc' 
  AND CAST(t8.dataType as VARCHAR) = 'ia' 
  AND CAST(t9.dataType as VARCHAR) = 'ic'
  AND CAST(t10.dataType as VARCHAR) = 'phaseVc'
  AND CAST(t11.dataType as VARCHAR) = 'phaseIa'
  AND CAST(t12.dataType as VARCHAR) = 'phaseIc'
  AND CAST(t13.dataType as VARCHAR) = 'tempVaa'
  AND CAST(t14.dataType as VARCHAR) = 'tempVb'
  AND CAST(t15.dataType as VARCHAR) = 'tempVc'
  AND CAST(t16.dataType as VARCHAR) = 'tempIa'
  AND CAST(t17.dataType as VARCHAR) = 'tempIb'
  AND CAST(t18.dataType as VARCHAR) = 'tempIc'
  AND CAST(t19.dataType as VARCHAR) = 'tempCta'
  AND CAST(t20.dataType as VARCHAR) = 'tempCtb'
  AND CAST(t21.dataType as VARCHAR) = 'tempCtc'
  AND CAST(t22.dataType as VARCHAR) = 'tempSom';
SoMZeY
  • 7
  • 5
  • 8
    "dataType is of type `TEXT`" - why? These are short strings. `TEXT` is a **deprecated** datatype for much longer strings than you seem to be using. If you had an index on `testPointKey,dataType` and no `cast` then the join query might be an OK alternative to pivoting - but text can't be a key column in an index. – Martin Smith Apr 20 '23 at 17:41
  • What is the complete DDL which created `Measurement2` ? – Luuk Apr 20 '23 at 17:49
  • 3
    Conditional aggregation would work, but you'll need to fix your design to use a data type that has been supported in the last 18 years; perhaps `int` and `varchar(12)` would be appropriate here. – Thom A Apr 20 '23 at 17:50
  • [paste the plan](https://www.brentozar.com/pastetheplan/?id=SyQihgJQ2), I need a wider screen...... – Luuk Apr 20 '23 at 17:56
  • I would also suggest using an exact datatype like numeric instead of float. – Sean Lange Apr 20 '23 at 18:29
  • @SeanLange: Using something else then FLOAT will not improve performance. (but might have other impacts which are not talked about (out-of-scope ?) in this question) – Luuk Apr 20 '23 at 19:12
  • @Luuk I realize it won't help performance. But from what the OP is storing is looks like float is a poor choice. – Sean Lange Apr 20 '23 at 19:35

2 Answers2

5

One thing I would do is move the conditional expressions from the WHERE clause to each live with their appropriate JOIN (and use better alias names). It won't do much if anything for performance, but it will make the query easier to read and maintain over time. Here's a shortened example:

SELECT m.dataValue AS hp, tHP.dataValue AS tempHp, keithly.dataValue AS keithley
FROM Measurement2 m
LEFT JOIN Measurement2 tHP ON tHP.testPointKey = m.testPointKey
    AND CAST(tHP.dataType as VARCHAR) = 'tempHp'
LEFT JOIN Measurement2 keithly ON keithley.testPointKey = m.testPointKey
    AND CAST(keithly.dataType as VARCHAR) = 'keithley'
WHERE CAST(m.dataType as VARCHAR) = 'hp' 

Note the use of LEFT JOIN instead of INNER, as the latter would omit the entire row (all values) if a single item were accidentally missed. This only works when we also move the conditional expressions to live the JOIN.

Separate from that, I agree with the comments about the data type of, well, dataType. The need to use a CAST() here is profoundly awful for performance, as it not only needs to run this same cast repeatedly for all values in the table, but it also breaks any possibility of using an index on the column.

The other thing you can do is use a GROUP BY expression to roll these rows up per testPointKey. Then put a CASE expression in an aggregate function like SUM() for each desired item. This is called conditional aggregation. Here's a shortened example:

SELECT 
    SUM(CASE WHEN CAST(m.dataType as VARCHAR) = 'hp' THEN m.dataValue END) As Hp
   ,SUM(CASE WHEN CAST(m.dataType as VARCHAR) = 'tempHp' THEN m.dataValue END) As tempHp
   ,SUM(CASE WHEN CAST(m.dataType as VARCHAR) = 'keithly' THEN m.dataValue END) As keithly
   -- ...
FROM Measurement2 m
GROUP BY m.testPointKey

Again, fixing dataType to avoid the cast (because what you have now really is broken) would make a profound improvement in the performance.

Finally, this resembles what is known as an Entity-Attribute-Value (EAV) schema pattern. This pattern has its place, especially in the area of plugins and extensibility, but overall it's considered an anti-pattern: that is, something to avoid until you absolutely need it. You will do MUCH better if you can design the table so it more resembles the desired output in the first place.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Conditional aggregation seems the most appropriate method, because this like data log system for for (electric) machine, with lot of data. Tables would be big size. 22-x joins on this tables - would be slowly everytime. Additionally, i presume grouping by m.testPointKey and m.mIndex. – ValNik Apr 20 '23 at 19:09
  • @ValNik - with the right datatypes and indexes the multiple joins would be fine https://stackoverflow.com/a/7449213/73226. Potentially each could seek exactly the right rows and just have a load of chained merge joins – Martin Smith Apr 20 '23 at 19:21
  • @Martin Smith, thank you for remark and link. It was convincing. I don't like PIVOT in terms of performance, but it's still worth looking deeper – ValNik Apr 20 '23 at 21:21
0

Due to the helpful comments/answers I found a good alternative in a from of conditional aggregation. With some minor tweaks made (and some more needed such as changing the TEXT to VARCHAR to avoid casting), here is updated code:

WITH
    TestPoints AS
    (
        SELECT * FROM MyTestPoints WHERE sessionKey = '9' --custom number
    ),

    UnfilteredMeasurements AS
    (
        SELECT * FROM MyMeasurements WHERE pointKey IN (SELECT recordKey FROM TestPoints)
    ),

    FilteredMeasurements AS
    (
        SELECT
            pointKey,
            MAX(index) as index,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'hp'              THEN myValue ELSE NULL END) AS hp,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempHp'          THEN myValue ELSE NULL END) AS tempHp,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'va'              THEN myValue ELSE NULL END) AS va,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'vc'              THEN myValue ELSE NULL END) AS vc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'vavc'            THEN myValue ELSE NULL END) AS vavc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'ia'              THEN myValue ELSE NULL END) AS ia,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'ic'              THEN myValue ELSE NULL END) AS ic,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'phaseVc'         THEN myValue ELSE NULL END) AS phaseVc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'phaseIa'         THEN myValue ELSE NULL END) AS phaseIa,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'phaseIc'         THEN myValue ELSE NULL END) AS phaseIc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempVa'          THEN myValue ELSE NULL END) AS tempVa,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempVb'          THEN myValue ELSE NULL END) AS tempVb,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempVc'          THEN myValue ELSE NULL END) AS tempVc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempIa'          THEN myValue ELSE NULL END) AS tempIa,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempIb'          THEN myValue ELSE NULL END) AS tempIb,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempIc'          THEN myValue ELSE NULL END) AS tempIc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempCta'         THEN myValue ELSE NULL END) AS tempCta,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempCtb'         THEN myValue ELSE NULL END) AS tempCtb,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempCtc'         THEN myValue ELSE NULL END) AS tempCtc,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'tempSom'         THEN myValue ELSE NULL END) AS tempSom,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'myDate'          THEN myValue ELSE NULL END) AS myDate,
            MAX(CASE WHEN CAST(myType AS VARCHAR) = 'myTime'          THEN myValue ELSE NULL END) AS myTime
        FROM UnfilteredMeasurements
        GROUP BY pointKey
    ),
Luuk
  • 12,245
  • 5
  • 22
  • 33
SoMZeY
  • 7
  • 5