I have several schema with same-named tables that I would like to quickly retrieve some statistics on text data (e.g., word frequency) across hundreds of groups.
Here is an example query:
SELECT subreddit,
round(avg(CASE WHEN text ILIKE '%trump%' THEN 1
ELSE 0 END),
5) AS trump_relative_frequency,
count(*)
FROM politics.comments
GROUP BY subreddit
HAVING count(*) > 100
ORDER BY trump_relative_frequency DESC;
subreddit | trump_relative_frequency | count
-----------------------+--------------------------+--------
NeverTrump | 0.34457 | 267
BannedFromThe_Donald | 0.28261 | 138
askhillarysupporters | 0.26429 | 140
Impeach_Trump | 0.25210 | 357
AmericanPolitics | 0.24606 | 317
AskThe_Donald | 0.24561 | 1653
AskTrumpSupporters | 0.24044 | 3215
KasichForPresident | 0.23519 | 591
HillaryForAmerica | 0.21578 | 431
WomenForTrump | 0.20280 | 143
Republican | 0.19994 | 3356
EnoughTrumpSpam | 0.19972 | 15457
I would like to quickly execute similar queries for a variety of strings other than 'trump'
, but it seems that I must declare the column names that are being returned, which can technically be determined based on the parameters, but not when you only construct the function.
The only workaround I see is returning a table containing one column for the search parameter and another containing an HSTORE containing subreddit:count key-value pairs. This isn't ideal, as I would like to export the data to CSV with the given column names above, and the only other solution I have is using Python or another programming language to execute the query via a command-line script.
Is there a better way of going about this?