1

I have several schema with same-named tables that I would like to quickly retrieve some statistics on text data (e.g., word frequency) across hundreds of groups.

Here is an example query:

SELECT subreddit, 
       round(avg(CASE WHEN text ILIKE '%trump%' THEN 1 
                        ELSE 0 END), 
             5) AS trump_relative_frequency, 
       count(*) 
       FROM politics.comments 
       GROUP BY subreddit 
       HAVING count(*) > 100 
       ORDER BY trump_relative_frequency DESC;

       subreddit       | trump_relative_frequency | count  
-----------------------+--------------------------+--------
 NeverTrump            |                  0.34457 |    267
 BannedFromThe_Donald  |                  0.28261 |    138
 askhillarysupporters  |                  0.26429 |    140
 Impeach_Trump         |                  0.25210 |    357
 AmericanPolitics      |                  0.24606 |    317
 AskThe_Donald         |                  0.24561 |   1653
 AskTrumpSupporters    |                  0.24044 |   3215
 KasichForPresident    |                  0.23519 |    591
 HillaryForAmerica     |                  0.21578 |    431
 WomenForTrump         |                  0.20280 |    143
 Republican            |                  0.19994 |   3356
 EnoughTrumpSpam       |                  0.19972 |  15457

I would like to quickly execute similar queries for a variety of strings other than 'trump', but it seems that I must declare the column names that are being returned, which can technically be determined based on the parameters, but not when you only construct the function.

The only workaround I see is returning a table containing one column for the search parameter and another containing an HSTORE containing subreddit:count key-value pairs. This isn't ideal, as I would like to export the data to CSV with the given column names above, and the only other solution I have is using Python or another programming language to execute the query via a command-line script.

Is there a better way of going about this?

Max Candocia
  • 4,294
  • 35
  • 58

0 Answers0