1

Using a query along with the functions available in postgres such as string_to_array and string_agg, data from raw tables are converted to following results set.

id, text
001, {foo,boo,foo}
002, {"",for,test,friday}
003, {"","",test,friday,tuesday,foo,boo}

Here id is id of a person and text is actually type of array. Now what I am trying to do is generate following structure.

id, text, text_count
001, foo, 2
001, boo, 1
002, test, 1
002, friday, 1 

This is the query I used to get my existing format that I have mentioned, but how can I enhance this query to get the id, text, text_count results.

select id, string_to_array(string_agg(b.text,' '), ' ') as words
from tableA a,tableB b group by id

I would also like to get rid of data with "", I believe they are empty strings in postgres but not really sure.

add-semi-colons
  • 18,094
  • 55
  • 145
  • 232

1 Answers1

1

Use unnest().
Assuming id to be unique:

SELECT id, txt, count(*) As txt_count
FROM  (
   SELECT id
        , unnest(txt) AS txt
   FROM   tbl
   ) sub
WHERE  txt <> ''
GROUP  BY id, txt
ORDER  BY id, txt;

txt instead of text, because I never use basic type names as identifiers.
The condition WHERE txt <> '' removes both empty stings ('') as well as NULL values.

When unnesting an array, you get as many rows in the result set as there are elements in the array. Careful, when unnesting multiple arrays in parallel:
Is there something like a zip() function in PostgreSQL that combines two arrays?
Parallel unnest() and sort order in PostgreSQL

There is a cleaner syntax variant in Postgres 9.3+ for this with a LATERAL JOIN:

SELECT id, txt, count(*) As txt_count
FROM  (
   SELECT id, x.txt
   FROM   tbl t, unnest(t.txt) x(txt)
   ) sub
WHERE  txt <> ''
GROUP  BY id, txt
ORDER  BY id, txt;

Details here:
PostgreSQL unnest() with element number

SQL Fiddle.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Can I used my query inside values where because all my query does is spit out those values. – add-semi-colons Jun 13 '14 at 16:31
  • 1
    @Null-Hypothesis: I added a cleaner form using a table instead of the VALUES expression (which was just a quick substitute for the actual table). And a fiddle. – Erwin Brandstetter Jun 13 '14 at 16:39
  • So I don't need to use string_to_array(string_agg(b.text,' '), ' ')? because without that query i don't have the data that I need to use for the query that you have mentioned. – add-semi-colons Jun 13 '14 at 16:59
  • 1
    @Null-Hypothesis: You wrote `text is actually type of array`. If it's an array (`text[]`) like in my sqlfiddle, all you need is `unnest()`. It's always better to provide a *table definition* in the question ... – Erwin Brandstetter Jun 13 '14 at 18:25