13

Is there a way to group by a unique (primary) key, essentially giving an implicit guarantee that the other columns from that table will be well-defined?

SELECT myPrimaryKey, otherThing
FROM myTable
GROUP BY myPrimaryKey

I know that I can add the other columns to the statement (GROUP BY myPrimaryKey,otherThing), but I'm trying to avoid that. If you're curious why, read on:


I have a statement which is essentially doing this:

SELECT nodes.node_id, nodes.node_label, COUNT(1)
FROM {a couple of joined tables}
INNER JOIN nodes USING (node_id)
GROUP BY nodes.node_id, nodes.node_label

which works fine, but is a bit slow in MySQL. If I remove nodes.node_label from the GROUP BY, it runs about 10x faster (according to EXPLAIN, this is because one of the earlier joins starts using an index when previously it didn't).

We're in the process of migrating to Postgres, so all new statements are supposed to be compatible with both MySQL and Postgres when possible. Now in Postgres, the original statement runs fast, but the new statement (with the reduced group by) won't run (because Postgres is stricter). In this case, it's a false error because the statement is actually well-defined.

Is there a syntax I can use which will let the same statement run in both platforms, while letting MySQL use just one column in the group by for speed?

Dave
  • 44,275
  • 12
  • 65
  • 105
  • 3
    `In this case, it's a false error because the statement is actually well-defined.` No no no, MySql accepts weirds things for GROUP BY (with unpredictable results), all other DBMS force you to get predictable results (which is usually usefull). To find a solution, I would rather work on index management, to avoid MySql slowness ! – Raphaël Althaus Jun 05 '14 at 14:39
  • 1
    @RaphaëlAlthaus usually that's true, but grouping by the primary key (or any `UNIQUE` key) ensures that all the other values within the same table are well-defined. – Dave Jun 05 '14 at 14:41
  • but I doubt that it's how a dbms works. I don't think that query scanning and parsing checks for primary keys / unique keys. That comes after (query optimization / execution strategy)... – Raphaël Althaus Jun 05 '14 at 14:49
  • 1
    By the way, does the execution plan change if you do `select nodes.node_id, MIN(nodes.node_label), count(1)... GROUP BY nodes.node_id` : which would be accepted by both dbms ? – Raphaël Althaus Jun 05 '14 at 14:54
  • @RaphaëlAlthaus that worked perfectly. It's not quite as fast in MySQL as the version which just has the id and no min, but it's within 20% (it's using the index I wanted). Do you want to post an answer? – Dave Jun 05 '14 at 14:58
  • Well, I really think you should look for index management, rather than this "trick" ;) – Raphaël Althaus Jun 05 '14 at 15:04
  • Unfortunately, all the indexes are set up correctly already (which is why it works quickly when the planner gets it right and why it works quickly in Postgres). Even the label column is indexed. It's just that MySQL's planner seems to get confused by the statement. – Dave Jun 05 '14 at 15:22
  • Oh right then, I'm really not good at Mysql index management ! – Raphaël Althaus Jun 05 '14 at 15:25
  • 6
    PostgreSQL should support grouping by primary-key-only since 9.1 http://www.postgresql.org/docs/current/static/release-9-1.html#AEN120856 – Richard Huxton Jun 05 '14 at 16:24
  • @RichardHuxton ah great! I'll ask to get our test environment updated; I didn't realise we were on an old version. – Dave Jun 05 '14 at 16:36
  • @RichardHuxton: Works just fine for columns of a single table. Not for columns of additional joined tables, though. – Erwin Brandstetter Jun 05 '14 at 16:51

3 Answers3

2

In more recent versions of MySql you might have sql_mode=only_full_group_by enabled which doesn't allow to select non-aggregated columns when using group by i.e. it forces you to use a function like max() or avg() or group_concat(), sometimes you just want any value.

This flag is enabled by default in MySql 5.7.

The function any_value() is available when that flag is enabled.

You can achieve the same effect without disabling ONLY_FULL_GROUP_BY by using ANY_VALUE() to refer to the nonaggregated column.

select t.index, any_value(t.insert_date)
from my_table t
group by t.index;

More information here: https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by and here: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

santiago arizti
  • 4,175
  • 3
  • 37
  • 50
1

In Postgres (not in MySQL, though), you could use DISTINCT ON to pick a single, consistent row per value (or group of values) without aggregating them:

SELECT DISTINCT ON (n.node_id)
       *                 -- select any or all columns of all joined tables
FROM   {a couple of joined tables}
JOIN   nodes n USING (node_id)

That gives you a single, arbitrary row for each node_id. to pick a specific row, add:

ORDER  BY n.node_id, ... -- what to sort first?

.. add more ORDER BY items to pick a specific row. Details:
Select first row in each GROUP BY group?

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

You could try converting the other columns into aggregates:

SELECT myPrimaryKey, MAX(otherThing)
FROM myTable
GROUP BY myPrimaryKey
mc110
  • 2,825
  • 5
  • 20
  • 21
  • I like this trick, and I'll be using it for now. But as RichardHuxton pointed out, the behaviour I'm looking for is allowed in Postgres 9.1+, so as a more permanent solution I'll be asking to get the test database updated. – Dave Jun 05 '14 at 16:39
  • WHAT SHOULD BE DONE, WHEN I HAVE TO PROJECT ONLY PRIMARY KEY? – iVikashJha Apr 17 '17 at 12:49