2

I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:

[
  {
    match_type: "exact",
    search_query: "gm hubcaps",
    conversions: 2,
    cost: 1.24
  },
  {
    match_type: "broad",
    search_query: "gm auto parts",
    conversions: 34,
    cost: 21.33
  },
  {
    match_type: "phrase",
    search_query: "silverdo headlights",
    conversions: 63,
    cost: 244.05
  }
]

What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?

I'm using Postgresql 9.4

EDIT The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Adam Kolkman
  • 132
  • 1
  • 11
  • 1
    Maybe you should make it a permanent table, as this structure fits exactly the relational model and keeping it in JSON means every query is going to be more complex and less efficient. – Daniel Vérité Mar 19 '15 at 15:05
  • Table definition (`\d tbl` in psql) and your version of Postgres are *essential* to this question. Also: sum conventions *per report* or overall? – Erwin Brandstetter Mar 19 '15 at 15:10

2 Answers2

1

You could use unnest:

select sum(conv) from
(select d->'conversion' as conv from
(select unnest(data) as d from <your table>) all_data
) all_conv

Disclaimer: I don't have Pg 9.2 so I couldn't test it myself.

EDIT: this is assuming that the array you mentioned is a Postgresql array, i.e. that the data type of your data column is character varying[]. If you mean the data is a json array, you should be able to use json_array_elements instead of unnest.

martin
  • 2,520
  • 22
  • 29
1

What I want to do is query off these data hashes and sum up the conversions

The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.

CREATE TEMP TABLE report_data (
--   match_type text    -- commented out, because we only need ..
-- , search_query text  -- .. conversions for this query
     conversions int
-- , cost numeric
);

A temp table is one way to register a row type ad-hoc. More explanation in this related answer:

Assuming a table report with report_id as PK for lack of inforamtion.

SELECT r.report_id, sum(d.conversions) AS sum_conversions
FROM   report r
LEFT   JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
-- WHERE  r.report_id = 12345  -- only for given report?
GROUP  BY 1;

The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.

For a sum from a single row in the underlying table, this is faster:

SELECT d.sum_conversions
FROM   report r
LEFT   JOIN LATERAL (
   SELECT sum(conversions) AS sum_conversions
   FROM   jsonb_populate_recordset(null::report_data, r.data)
   ) d ON true
WHERE  r.report_id = 12345;  -- enter report_id here

Alternative with jsonb_array_elements() (no need for a registered row type):

SELECT d.sum_conversions
FROM   report r
LEFT   JOIN LATERAL (
   SELECT sum((value->>'conversions')::int) AS sum_conversions
   FROM   jsonb_array_elements(r.data)
   ) d ON true
WHERE  r.report_id = 12345;  -- enter report_id here

Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I hear you on the normalization. The reason I took this approach is because the project requires that users can upload any csv document, regardless of how it's structured and what columns are in it. I was just using this AdWords report as an example. – Adam Kolkman Mar 19 '15 at 15:42
  • Also, thanks for pointing out the lack of details. I'll edit the question. – Adam Kolkman Mar 19 '15 at 15:42