1

I have a table with a couple of 100 columns on which I want to use the GREATEST function. How can I prevent putting each and every column name into the query?

The columns on which I want to use GREATEST have a common prefix in case this might help.

That's how I am doing it now.

CREATE TEMP TABLE foo (id int, cc_a int, cc_b int, cc_c int);
INSERT INTO foo VALUES (3, '1', '2', '3'), (4, '15', '4', '100');

SELECT id, GREATEST(cc_a, cc_b, cc_c) FROM foo
Alex
  • 4,925
  • 2
  • 32
  • 48
  • 3
    That's an indication for a bad data modeling – David דודו Markovitz Aug 27 '22 at 11:32
  • @DavidדודוMarkovitz The data is stored this way for efficiency reasons. If the data was e.g. stored in a longer table so that column wise operations would be applicable it would just explode. But if there are some other general hints on preventing row wise operations I am eager to hear it. – Alex Aug 27 '22 at 11:49
  • 1
    "It would just explode"? care to explain? – David דודו Markovitz Aug 27 '22 at 11:58
  • 1
    GREATEST() can't use an index, like MAX() would use with proper 3NF datamodel. – Frank Heikens Aug 27 '22 at 12:03
  • It is a time series table. For each timestamp I have the data for arround 1000 units. Then I have 100 columns I denoted witht the `cc_` prefix in the example. If I build the table like this: `timestamp | unit_id| cc_type| cc_value` I would get per timestamp 1000 * 100 rows. The time frequency is only hourly but even then I would get per day 2.4 million rows. The way it is right now it is only 24,000. Hope this was understandable – Alex Aug 27 '22 at 12:08
  • 3
    We handle about 40 million new records a day, and not even close to the limits of our hardware nor PostgreSQL. To me it looks like you made it pretty complex and are now getting into trouble. Are there any other performance and/or code issues that you are already facing? – Frank Heikens Aug 27 '22 at 12:13
  • Still setting up this DB for a new project. It is the first time I ran into an issue. I guess I should put those data in a separate table then in the format I described in my last comment? – Alex Aug 27 '22 at 12:27
  • Why not use a solution that is built for time-series? For PostgreSQL based solution you can take a look on timescaledb (https://github.com/timescale/timescaledb). My personal favorite for the last 5 years has been Azure Data Explorer, aka ADX, aka Kusto and recently I joined its product group. https://azure.microsoft.com/en-us/services/data-explorer/ – David דודו Markovitz Aug 27 '22 at 12:49
  • 1
    For the project we cannot use cloud based solutions and we are already using timescale. – Alex Aug 28 '22 at 10:35

2 Answers2

3

You can turn a row into a list of values using e.g. JSON functions:

SELECT id, 
       (select max(x.val::int)
       from jsonb_each(to_jsonb(f) - 'id') as x(col, val))
from foo f

Online example

But I do agree that this smells like a bad design.

  • Thank you! This works. Since I got much feedback on the probably not appropriate data model asked a follow-up question here: https://stackoverflow.com/questions/73517740/how-to-efficiently-compute-in-step-1-differences-between-columns-and-in-step-2-a – Alex Aug 28 '22 at 10:13
1

Since you want to operate on a large number of values per id, it's time to bite the bullet and create an UNPIVOTed view of your table.

Con : you have to type the hundred values
Pro : you only have to it once in your code base

I know you are working on PostgreSQL, so there are solutions such as the hstore based solution which according to the answerer, may work on an arbitrary large number of columns.

if you can't use the hstore extension - it's not installed on dbfiddle.com. then you can fall back on LATERAL JOIN, inspired by https://blog.sql-workbench.eu/post/unpivot-with-postgres/. I tested a solution in this fiddle

Johan Buret
  • 2,614
  • 24
  • 32