How can I calculate a dot product in SQL

Question

This is similar to this post, but different enough that I'm asking again. The proposed solutions would also be impractically slow in my case.

Note: I am also working with really large datasets in Redshift.

I am trying to calculate "scores" for a statistical model, which requires that I calculate a dot product for every row in a data set. In particular, I would like to create a new column in my table "SCORE" that multiplies the values for each column in the first table by a pre-specified value for that column.

For example, I have the following tables:

INITIAL

+--------+------+-------+
|  NAME  | COL0 | COL1  |
+--------+------+-------+
| AL     | 1    | 4     |
| BILL   | 2    | 5     |
| CATHY  | 3    | 6     |
+--------+------+-------+

ESTIMATES

+--------+----------+
|  NAME  | ESTIMATE |
+--------+----------+
| COL0   | 5        |
| COL1   | 10       |
+--------+----------+

and I want a final table,

FINAL

+--------+------+-------+-------+
|  NAME  | COL0 | COL1  | SCORE |
+--------+------+-------+-------+
| AL     | 1    | 4     |  45   |
| BILL   | 2    | 5     |  60   |
| CATHY  | 3    | 6     |  75   |
+--------+------+-------+-------+

For the final table, the values for each column in the INITIAL table is multiplied by a different number depending on the values listed in the ESTIMATES table For example, the SCORE for AL is derived from 1 * 5 + 4 * 10 = 45. Where the 5 and 10 come from the estimates table.

Currently, I am creating the SCORE column by writing SQL code that manually writes in the numbers from the estimates.

You need to be prepared for "impractically slow" no matter how it's done if your datasets are truly large and you have many columns. — user2338816, Jun 27 '14 at 23:52

score 3 · Answer 1 · answered Jun 27 '14 at 20:36

You can do this by joining the tables and then using conditional aggregation:

select i.name,
       max(i.col0) * max(case when e.name = 'col0' then estimate end) as col0,
       max(i.col1) * max(case when e.name = 'col1' then estimate end) as col1,
       (max(i.col0) * max(case when e.name = 'col0' then estimate end) +
        max(i.col1) * max(case when e.name = 'col1' then estimate end)
       ) as score       
from initial i cross join
     estimates e
group by i.name;

score 2 · Answer 2 · answered Jun 27 '14 at 20:43

Assuming estimates table will always result in one row when pivoting using case...

SELECT i.name, i.col0, i.col1, (pest.mcol0*i.col0+pest.mcol1*i.col1) as score
FROM Initial
CROSS JOIN 
(select 
  max(case name when 'COL0' then estimate end) as mcol0, 
  max(case name when 'Col1' then estimate end) as mcol1 
 FROM estimates) Pest -- pivot Estimate

score 0 · Answer 3 · answered Jul 07 '14 at 10:24

You can do that by joining the tables as:

SELECT i.NAME,i.COL0,i.COL1,
       (MAX(i.COL0) * MAX(CASE WHEN e.name = 'COL0' THEN estimate end) +
        MAX(i.COL1) * MAX(CASE WHEN e.name = 'COL1' THEN estimate end)) AS SCORE
FROM INITIAL i CROSS JOIN ESTIMATES E GROUP BY i.NAME,i.COL0,i.COL1 ORDER BY NAME;

Please refer this SQLFiddle Link as reference

score 0 · Answer 4 · answered Jan 24 '23 at 18:10

For future-comers, a postgresql implementation of vector dot product is available here:

CREATE OR REPLACE FUNCTION public.dot_product(IN vector1 double precision[], IN vector2 double precision[])
    RETURNS double precision    
AS $BODY$
BEGIN
    RETURN(SELECT sum(mul) FROM (SELECT v1e*v2e as mul FROM unnest(vector1, vector2) AS t(v1e,v2e)) AS denominator);
END;
$BODY$ LANGUAGE 'plpgsql';

How can I calculate a dot product in SQL

4 Answers4