0

I'm converting a huge keystore table to columns, with the following query. However, I need to remove the sparse columns and have only one row per id. How?

I can't use coalesce, Postgres 8.4. I can't install any packages.

Query:

(select id 
    , case when udfname = 'ptid'      then udfvalue end as "ptid"
    , case when udfname = 'col1'      then udfvalue end as "col1"
    , case when udfname = 'col2'      then udfvalue end as "col2"
    , case when udfname = 'col3'      then udfvalue end as "col3"
    , case when udfname = 'col4'      then udfvalue end as "col4"
    , case when udfname = 'xref_col5' then udfvalue end as "xref_col5"
from uglykeystoretable where sampleid = 656556) -- restricting to one id so it won't clog. 
;

sample result

This substring(array_to_string()) hack makes it work, however there must be a cleaner solution. Let me know.

select 
substring ( array_to_string(array_agg(sampleid), ',') ,  1, strpos(   array_to_string(array_agg(sampleid), ','), ',')-1 ) as sampleid
,array_to_string(array_agg(pid), ',') as pid
,array_to_string(array_agg(dob), ',') as dob
,array_to_string(array_agg(seq_res), ', ')  as seq_res
,array_to_string(array_agg(sx), ',')  as sx
,array_to_string(array_agg(wrkspc), ',')  as wrkspc
,array_to_string(array_agg(xref_isq_sid), ',') as xref_isq_sid 
,array_to_string(array_agg(dt_coll), ',')  as dt_coll
,array_to_string(array_agg(lcus), ',')  as lcus
,array_to_string(array_agg(spcs), ',') as spcs
,nullif(array_to_string(array_agg(fname), ''),'') as fname
,nullif(array_to_string(array_agg(lname), ''),'')  as lname
from 
(select sampleid
    , (case when udfname = 'pid'            then udfvalue end) as "pid"
    , (case when udfname = 'dob'            then udfvalue end) as "dob"
    , case when udfname = 'Sequencing Resolution'       then udfvalue end as "seq_res"
    , case when udfname = 'sx'              then udfvalue end as "sx"
    , case when udfname = 'wrkspc'          then udfvalue end as "wrkspc"
    , case when udfname = 'xref_iseq_sid'       then udfvalue end as "xref_isq_sid"
    , case when udfname = 'dt_coll'         then udfvalue end as "dt_coll"
    , case when udfname = 'lcus'                then udfvalue end as "lcus"
    , case when udfname = 'spcs'                then udfvalue end as "spcs"
    , case when udfname = 'lname'           then udfvalue end as "lname"
    , case when udfname = 'fname'           then udfvalue end as "fname"
from ugly_keystore_table
--where sid >( SELECT MAX(sid)-1000 FROM MyEntity)
) as sample_details
group by sid
;
Espresso
  • 5,378
  • 4
  • 35
  • 66
  • The essential thing for your question would be the exact table definition of `uglykeystoretable` and an example value. Your added query is incomplete and [Postgres 8.4 has `COALESCE`](http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html#AEN15281), btw. (Not saying you'd need it to solve your case.) – Erwin Brandstetter Jan 29 '16 at 19:28

3 Answers3

1

Making some assumptions about the actual definition of uglykeystoretable, crosstab() should be shortest and fastest:

SELECT * FROM crosstab(
   'SELECT id, udfname, udfvalue
    FROM   uglykeystoretable
    ORDER  BY 1,2'
 , 'SELECT unnest('{ptid,col1,col2,col3,col4,xref_col5}'::text[])'
   ) AS t (id int, ptid text, col1 text, col2 text, col3 text, col4 text, xref_col5 text);

Data types must match.
You need the additional module tablefunc. Detailed instructions:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks, but I need to do without crostab, I put postgres 8.4 in title, but someone removed it. – Espresso Jan 29 '16 at 20:30
  • I removed it, it doesn't need to be in the title, it's in the question body. You *did* mention that you can't install any packages, though. So this answer is not an option for you (unless `tablefunc` is installed already). – Erwin Brandstetter Jan 29 '16 at 21:10
0

Use an aggregate to get rid of the null values:

select id, 
       max(case when udfname = 'ptid' then udfvalue end) as "ptid", 
       max(case when udfname = 'col1' then udfvalue end) as "col1", 
       max(case when udfname = 'col2' then udfvalue end) as "col2", 
       max(case when udfname = 'col3' then udfvalue end) as "col3",
       max(case when udfname = 'col4' then udfvalue end) as "col4",
       max(case when udfname = 'xref_col5' then udfvalue end) as "xref_col5"
from uglykeystoretable 
where sampleid = 656556
group by id;
0

The solution in the edit worked without crostab() installation.

Solution1 :

select 
substring ( array_to_string(array_agg(sampleid), ',') ,  1, strpos(   array_to_string(array_agg(sampleid), ','), ',')-1 ) as sampleid
,array_to_string(array_agg(pid), ',') as pid
,array_to_string(array_agg(dob), ',') as dob
,array_to_string(array_agg(seq_res), ', ')  as seq_res
,array_to_string(array_agg(sx), ',')  as sx
,array_to_string(array_agg(wrkspc), ',')  as wrkspc
,array_to_string(array_agg(xref_isq_sid), ',') as xref_isq_sid 
,array_to_string(array_agg(dt_coll), ',')  as dt_coll
,array_to_string(array_agg(lcus), ',')  as lcus
,array_to_string(array_agg(spcs), ',') as spcs
,nullif(array_to_string(array_agg(fname), ''),'') as fname
,nullif(array_to_string(array_agg(lname), ''),'')  as lname
from 
(select sampleid
    , (case when udfname = 'pid'            then udfvalue end) as "pid"
    , (case when udfname = 'dob'            then udfvalue end) as "dob"
    , case when udfname = 'Sequencing Resolution'       then udfvalue end as "seq_res"
    , case when udfname = 'sx'              then udfvalue end as "sx"
    , case when udfname = 'wrkspc'          then udfvalue end as "wrkspc"
    , case when udfname = 'xref_iseq_sid'       then udfvalue end as "xref_isq_sid"
    , case when udfname = 'dt_coll'         then udfvalue end as "dt_coll"
    , case when udfname = 'lcus'                then udfvalue end as "lcus"
    , case when udfname = 'spcs'                then udfvalue end as "spcs"
    , case when udfname = 'lname'           then udfvalue end as "lname"
    , case when udfname = 'fname'           then udfvalue end as "fname"
from ugly_keystore_table
--where sid >( SELECT MAX(sid)-1000 FROM MyEntity)
) as sample_details
group by sid
;

Solution 2 (with joins) very slow :

SELECT v.sampleid
        , l1.udfvalue AS lab1
        , l2.udfvalue AS lab2
        , l3.udfvalue AS lab3
        , l4.udfvalue AS lab4
        , l5.udfvalue AS lab5
        , l6.udfvalue AS lab6
        , l7.udfvalue AS lab6
        , l8.udfvalue AS lab6
        , l9.udfvalue AS lab6
        , l10.udfvalue AS lab6
FROM sample_udf_view v
LEFT JOIN sample_udf_view l1 ON l1.sampleid = v.sampleid AND l1.udfname = 'pid'
LEFT JOIN sample_udf_view l2 ON l2.sampleid = v.sampleid AND l2.udfname = 'dob'
LEFT JOIN sample_udf_view l3 ON l3.sampleid = v.sampleid AND l3.udfname = 'seq_res'
LEFT JOIN sample_udf_view l4 ON l4.sampleid = v.sampleid AND l4.udfname = 'Sx'
LEFT JOIN sample_udf_view l5 ON l5.sampleid = v.sampleid AND l5.udfname = 'wspc'
LEFT JOIN sample_udf_view l6 ON l6.sampleid = v.sampleid AND l6.udfname = 'dt_col'
LEFT JOIN sample_udf_view l7 ON l7.sampleid = v.sampleid AND l7.udfname = 'loc'
LEFT JOIN sample_udf_view l8 ON l8.sampleid = v.sampleid AND l8.udfname = 'specs'
LEFT JOIN sample_udf_view l9 ON l9.sampleid = v.sampleid AND l9.udfname = 'fname'
LEFT JOIN sample_udf_view l10 ON l10.sampleid = v.sampleid AND l10.udfname = 'lname';
Espresso
  • 5,378
  • 4
  • 35
  • 66