3

I'm working with data in PostgreSQL that uses a data dictionary table to provide descriptions for the column (variable) names of other tables in the dataset. For example:

Table 1:

a00600 | a00900
-------+-------
row 1  | row 1
row 2  | row 2

Data Dictionary (Key) columns:

Variable | Description
---------+------------
a00600   | Total population
a00900   | Zipcode

For reporting purposes, how do I write SQL to perform the following dynamically (without specifying each column name)?

SELECT 'a00600' AS (SELECT Key.Description
WHERE Key.Variable = 'a00600')
FROM Table 1;

I realize there's likely a better way to parse this question/problem and am open to any ideas for what I need to accomplish.

  • wha? Why would you even _do_ this (store column names this way)? I'm assuming there's typos in some of the data/queries, otherwise it should match? And I'm not sure you can query for aliases in-statement... DB2 doesn't at least, and I'm fairly sure _no_ RDBMS would allow that. – Clockwork-Muse Sep 04 '12 at 19:05
  • @X-Zero: It can't be done in plain SQL. You need dynamic SQL - basically prepare the statement and then execute it - with possible enhancements like plpgsql offers. – Erwin Brandstetter Sep 04 '12 at 19:08
  • @X-Zero: Most data provided from US gov't statistical agencies (Census, BLS, etc.) is in this format. Column names are much easier to work with this way... you wouldn't want to SELECT "Taxable individual retirement arrangements distributions" as a column. – bitter_buffalo Sep 04 '12 at 19:24
  • @bitter_buffalo - Granted, that column name is a bit of a mouthful. But at least it's _understandable_ (err, to CPAs, at least, hopefully). It's also (generally) much harder for a typo in the 'proper' name to turn it into a different column (as opposed to just failing), which could be a maintenance issue. I've had enough of 8-character cryptic names in RPG, thank you very much. – Clockwork-Muse Sep 04 '12 at 19:58

2 Answers2

2

You need to use dynamic SQL with a procedural language function. Usually plpgsql and use EXECUTE with it.

The tricky part is to define the return type at creation time.

I have compiled a number of solutions in this related answer.
There are lots of related answer on SO already. Search for combinations of terms like [plpgsql] EXECUTE RETURN QUERY [dynamic-sql] quote_ident.


Your approach is commonly frowned upon among database designers.
My personal opinion: I wouldn't go that route. I always use basic, descriptive names. You can always add more décor in your application if needed.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for this lead... I'll be researching dynamic SQL. Of course, I would prefer the description table be tied into a star schema, but I'm forced to use the data as made available by the gov't at this point. – bitter_buffalo Sep 04 '12 at 20:08
0

Another way to get the descriptions instead of the actual column names would be to create views (one for every table). This can be automated by generating the views automatically. This looks rather clumsy, but it has the huge advantage that for "complex* queries the resulting queryplans will be axactly the same as for the original columns names. (functions joined into complex queries will perform badly: the optimiser cannot take them apart, so the resulting behavior will be equivalent to "row at a time") Example:

-- tmp schema is only for testing
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE TABLE thedata
        ( a00600 varchar
        , a00900 varchar
        );
INSERT INTO thedata(a00600 , a00900) VALUES
 ('key1', 'data1')
,('key2', 'data2');

CREATE TABLE thedict
        ( variable varchar
        , description varchar
        );

INSERT INTO thedict(variable , description) VALUES
 ('a00600'   , 'Total population')
,('a00900'   , 'Zipcode' );

CREATE OR REPLACE FUNCTION create_view_definition(zname varchar)
  RETURNS varchar AS
$BODY$
DECLARE
   thestring varchar;
   therecord RECORD;
   iter INTEGER ;
   thecurs cursor for
        SELECT co.attname AS zname, d.description AS zdesc
        FROM pg_class ct
        JOIN pg_namespace cs ON cs.oid=ct.relnamespace
        JOIN pg_attribute co ON co.attrelid = ct.oid AND co.attnum > 0
        LEFT JOIN thedict d ON d.variable = co.attname
        WHERE ct.relname = 'thedata'
        AND cs.nspname = 'tmp'
        ;
BEGIN
        thestring = '' ;
        iter = 0;
        FOR therecord IN thecurs LOOP
                IF (iter = 0) THEN
                        thestring = 'CREATE VIEW ' || quote_ident('v'||zname) || ' AS ( SELECT ' ;
                ELSE
                        thestring = thestring || ', ';
                END IF;
                iter=iter+1;

                thestring = thestring || quote_ident(therecord.zname);

                IF (therecord.zdesc IS NOT NULL) THEN
                        thestring = thestring || ' AS ' || quote_ident(therecord.zdesc);
                END IF;

        END LOOP;
        IF (iter > 0) THEN
                thestring = thestring || ' FROM ' || quote_ident(zname) || ' )' ;
        END IF;

RETURN thestring;

END;
$BODY$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION execute_view_definition(zname varchar)
  RETURNS INTEGER AS
$BODY$
DECLARE
   meat varchar;
BEGIN
   meat = create_view_definition(zname);
   EXECUTE meat;
RETURN 0;

END;
$BODY$ LANGUAGE plpgsql;

SELECT create_view_definition('thedata');
SELECT execute_view_definition('thedata');

SELECT * FROM vthedata;

RESULT:

CREATE FUNCTION
CREATE FUNCTION
                                      create_view_definition                                       
---------------------------------------------------------------------------------------------------
 CREATE VIEW vthedata AS ( SELECT a00600 AS "Total population", a00900 AS "Zipcode" FROM thedata )
(1 row)

 execute_view_definition 
-------------------------
                       0
(1 row)

 Total population | Zipcode 
------------------+---------
 key1             | data1
 key2             | data2
(2 rows)

Please note this is only an example. If it were for real, I would at least put the generated views into a separate schema, to avoid name collisions and pollution of the original schema.

wildplasser
  • 43,142
  • 8
  • 66
  • 109