I have a database with about 2500 results from 2022. I'm trying to build queries that will show me the top industry
and country
results from each month throughout the entire year. The preferred output would include the top X (number) of each category. I'm fine with breaking this out into two queries, one for each category, if needed.
The table looks like this:
post_id int
post_date date
post_victim text
threat_group text
victim_industry text
victim_country text
The idea here is that I can use these queries to put together data to brief teammates or generate graphs for visualizing the resulting data.
Here's an example of what I'd like to output via a SQL query (shortened to just Jan-Apr, but the final result should cover the entire year):
Industry | Jan | Feb | Mar | Apr |
---|---|---|---|---|
Healthcare | 95 | 79 | 58 | 12 |
Manufacturing | 45 | 90 | 72 | 65 |
Finance | 31 | 48 | 21 | 73 |
Education | 63 | 70 | 46 | 92 |
Technology | 86 | 34 | 99 | 25 |
Construction | 53 | 73 | 16 | 29 |
Retail | 70 | 90 | 95 | 73 |
Government | 95 | 73 | 21 | 58 |
Agriculture | 34 | 53 | 61 | 99 |
Transportation | 60 | 89 | 22 | 38 |
Hospitality | 70 | 38 | 96 | 79 |
Legal | 89 | 67 | 92 | 43 |
I tried using crosstab and specifying the following as ct (...
:
SELECT *
FROM crosstab(
$$
SELECT to_char(post_date, 'YYYY-MM') as month, victim_industry, count(*)
FROM ransomware_posts
WHERE post_date >= '2022-01-01' AND post_date < '2023-01-01'
GROUP BY month, victim_industry
ORDER BY month, count DESC
$$
) as ct (month text, industry_1 text, count_1 int, industry_2 text, count_2 int)
I think the issue may have something to do with the fact that there are more than 2 results per month, so the output is receiving more results than it's expecting, but I don't know for sure as I've never built a functioning crosstab query.
Here's the error I receive:
ERROR: return and sql tuple descriptions are incompatible
SQL state: 42601