1

I have a crosstab() query like the following:

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 = val1
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)

Simplified example, the actual query is really complex and contains important information. The above query returns N result rows after filtering with table.extra1 = val1.

When I change the query as follows:

SELECT *
FROM crosstab(
 'SELECT row_name, extra1, extra2..., another_table.category, value
  FROM   table t
  JOIN   another_table ON t.field_id = another_table.field_id
  WHERE  t.field = certain_value AND t.extra1 IN (val1, ...) --> more values
  ORDER  BY row_name ASC',
 'SELECT category_name FROM category_name WHERE field = certain_value'
) AS ct(row_name text, extra1 text, extra2 text, ...)
WHERE extra1 = val1; --> condition on the result

Added more possible values table.extra1 IN (val1, ...) and a final condition WHERE extra1 = val1. Now I get fewer rows than from the original one. To make it worse, if I add yet more values to IN (val1, ...), I get yet fewer rows. Why is that?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228

1 Answers1

1

extra1, extra2, ... are "extra columns" in crosstab terminology.
The manual for the tablefunc module explains the rules:

It may also have one or more “extra” columns. The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as “extra”. The “extra” columns are expected to be the same for all rows with the same row_name value.

And further down:

The output row_name column, plus any “extra” columns, are copied from the first row of the group.

Bold emphasis on key parts by me.

You only sort by row_name:

ORDER  BY row_name ASC

Does not matter in the first example where you filter with:

WHERE ... t.extra1 = 'val1'  -- single quotes by me

All input row have extra1 = 'val1' anyway. But it matters in the second example where you filter with:

WHERE ... t.extra1 IN('val1', ...) --> More values

Now, the first bolded requirement above is violated for the extra column extra1. While the sort order of the first input query is non-deterministic, resulting values for the "extra" column extra1 are picked arbitrarily. The more possible values for extra1, the fewer rows will end up having 'val1': that's what you observed.

You can still make it work: to report extra1 = 'val1' for every row_name that has at least one of those, change the ORDER BY to:

ORDER  BY row_name, (extra1 <> 'val1')

Sorts 'val1' on top. Explanation for that boolean expression (with links to more):

Other "extra" columns are still chosen arbitrarily while the sort order is not deterministic.

Crosstab basics:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I still have some problems with multiple values. For example, if I filter `extra1` with `val2`, I get 100 results. However, if I use `...t.extra1 IN('val1', 'val2')` and `...(extra1 <> 'val1'), (extra1 <> 'val2')`, I get different result sizes for those which have extra1 with 'val2', around `80` – jesusgonzalezrivera Jul 26 '19 at 11:41
  • You would `ORDER BY row_name, (extra1 <> 'val2')` in this case to get 'val2' wherever available. (Drop `(extra1 <> 'val1')` from the list.) – Erwin Brandstetter Jul 26 '19 at 13:49
  • The problem is that `extra1` can have one value from a list `['value1', 'value2', 'value3'...'valueN']` of permitted values. The normal case is to permit filtering with multiple values `['value1', 'value2']` and not just one. Is still possible to achieve it with `<>` operator? – jesusgonzalezrivera Jul 26 '19 at 13:56
  • Please ask a new question with all necessary details. Comments are not the place ... (You can always link to this one for context, and drop a link back in the comments here to get my attention.) – Erwin Brandstetter Jul 26 '19 at 13:58
  • You are right :). Here appears a new question https://stackoverflow.com/questions/57221777/unexpected-effect-of-filtering-on-result-from-crosstab-query-with-multiple-val – jesusgonzalezrivera Jul 26 '19 at 14:09