0

I have the following query:

SELECT DISTINCT
  branch.id,
  branch.name,
  SUM(CAST(COALESCE(NULLIF(REPLACE(dia_anterior.col_9, ',', '.'), ''), '0.0') AS double precision)) AS total_venta,
  SUM(CAST(COALESCE(NULLIF(REPLACE(dia_anterior.col_4, ',', '.'), ''), '0.0') AS double precision)) AS total_personas,
  SUM(CAST(COALESCE(NULLIF(REPLACE(ocupacion_dia_anterior.col_3, ',', '.'), ''), '0') AS double precision)) AS total_ocupacion
FROM branches AS branch
  INNER JOIN queries AS q_dia_anterior
    ON q_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'dia-anterior')
  INNER JOIN queries AS q_ocupacion_dia_anterior
    ON q_ocupacion_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'ocupacion-dia-anterior')
  INNER JOIN queries AS q_ventas_x_articulo_dia_anterior
    ON q_ventas_x_articulo_dia_anterior.query_structure_id = (SELECT id FROM query_structures WHERE query_structures.slug = 'ventas-x-articulo-dia-anterior')
  INNER JOIN branch_data AS dia_anterior
    ON dia_anterior.query_id = q_dia_anterior.id
      AND dia_anterior.branch_id = branch.id
  INNER JOIN branch_data AS ocupacion_dia_anterior
    ON ocupacion_dia_anterior.query_id = q_ocupacion_dia_anterior.id
      AND ocupacion_dia_anterior.branch_id = branch.id
  INNER JOIN branch_data AS ventas_x_articulo_dia_anterior
    ON ventas_x_articulo_dia_anterior.query_id = q_ventas_x_articulo_dia_anterior.id
      AND ventas_x_articulo_dia_anterior.branch_id = branch.id
WHERE
  branch.account_id = 1
GROUP BY
  branch.id,
  branch.name
ORDER BY
  2

When I group the results to sum the columns, looks like the values from different rows are concatenating, resulting in huge values.

Am I missing something?
Why am I getting the values summed up multiple times?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Saul Martínez
  • 920
  • 13
  • 28
  • I'm pretty sure you don't want `SELECT DISTINCT` if you're also doing a `GROUP BY`. – Sam Choukri Apr 08 '15 at 00:53
  • I'm actually getting the same result :/ – Saul Martínez Apr 08 '15 at 00:54
  • The `GROUP BY` in combination with the aggregate function `SUM()` is already returning a unique set of rows. So specifying `DISTINCT`, as you observed, doesn't alter the results, but it is an unnecessary distraction. It also causes the server to sort and unique your results unnecessarily. Sorry I couldn't help you with your main problem, though! – Sam Choukri Apr 08 '15 at 01:12
  • IIUC, "concatenation" is the wrong term here. This is rather confusing, since you also convert text to numeric data. Seems you mean "multiplication". – Erwin Brandstetter Apr 08 '15 at 04:06
  • You are right @ErwinBrandstetter – Saul Martínez Apr 08 '15 at 16:28

2 Answers2

2

Filling in with some educated guesses, this might be the query you want:

SELECT b.name
     , COALESCE(dia_actual.total_venta   , 0) AS total_venta
     , COALESCE(dia_actual.total_personas, 0) AS total_personas
     , COALESCE(ocupacion.total_ocupacion, 0) AS total_ocupacion
FROM   branches b
LEFT   JOIN LATERAL (
   SELECT sum(replace(NULLIF(bd.col_9, ''), ',', '.')::float) AS total_venta
        , sum(replace(NULLIF(bd.col_4, ''), ',', '.')::float) AS total_personas
   FROM   branch_data      bd
   JOIN   queries          q  ON q.id = bd.query_id
   JOIN   query_structures qs ON qs.id = q.query_structure_id
   WHERE  bd.branch_id = b.id  -- lateral reference
   AND    qs.slug = 'dia-actual'
   ) dia_actual ON true
LEFT   JOIN LATERAL (
   SELECT sum(replace(NULLIF(bd.col_3, ''), ',', '.')::float) AS total_ocupacion
   FROM   branch_data      bd
   JOIN   queries          q  ON q.id = bd.query_id
   JOIN   query_structures qs ON qs.id = q.query_structure_id
   WHERE  bd.branch_id = b.id  -- lateral reference
   AND    qs.slug = 'ocupacion-dia-anterior'
   ) ocupacion ON true
WHERE  b.account_id = 1
ORDER  BY  b.name;

Notes

  • Sum first, then join, to avoid multiplication of rows due to the proxy cross join.

  • Use LEFT JOIN to avoid losing rows it not values are found in one of the queries to the right.

  • However, since you select a single branch_id I switched to LEFT JOIN LATERAL ... ON true, assuming you are only using a small fraction of all rows.

  • No need for a more expensive CTE.

  • Also simplified your transformation from text to double precision.

Naturally, it would be better to store those numbers as a numeric data type to begin with - not as text.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for your response. I didn't know about `LEFT JOIN LATERAL ... ON true`. It actually reduces the execution time using this in combination with CTEs. The data is coming from different data sources, different POS software packages, different databases, that's why I'm storing strings only. – Saul Martínez Apr 08 '15 at 16:28
  • @SaulMartínez: It is almost *always* better to store timestamp data as timestamps. And I don't see how CTEs could help with performance here. – Erwin Brandstetter Apr 08 '15 at 17:00
0

Although I don't think this answer the question properly, I got the desired results with the following query:

WITH dia_actual AS (
SELECT
  branch_data.branch_id,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_9, ',', '.'), ''), '0') AS double precision) AS total_venta,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_4, ',', '.'), ''), '0') AS double precision) AS total_personas
FROM branch_data
  INNER JOIN queries AS query
    ON query.id = branch_data.query_id
  INNER JOIN query_structures AS query_structure
    ON query_structure.id = query.query_structure_id
WHERE
  query_structure.slug = 'dia-actual'
), ocupacion AS (
SELECT
  branch_data.branch_id,
  CAST(COALESCE(NULLIF(REPLACE(branch_data.col_3, ',', '.'), ''), '0') AS double precision) AS total_ocupacion
FROM branch_data
  INNER JOIN queries AS query
    ON query.id = branch_data.query_id
  INNER JOIN query_structures AS query_structure
    ON query_structure.id = query.query_structure_id
WHERE
  query_structure.slug = 'ocupacion-dia-anterior'
)
SELECT
  branch.name,
  SUM(dia_actual.total_venta) AS total_venta,
  SUM(dia_actual.total_personas) AS total_personas,
  SUM(ocupacion.total_ocupacion) AS total_ocupacion
FROM branches AS branch
  INNER JOIN dia_actual
    ON dia_actual.branch_id = branch.id
  INNER JOIN ocupacion
    ON ocupacion.branch_id = branch.id
WHERE
  branch.account_id = 1
GROUP BY
  branch.name
ORDER BY
  branch.name

Using CTEs the values are summed correctly.

Saul Martínez
  • 920
  • 13
  • 28