Dune SQL column with running totals and group by

Question

I am trying to write a Dune analytics query using their native SQL language using the Dune v2 engine. I'm not very experienced in writing SQL queries - it's been 10+ years.

At the moment I have written the following query, which works as intended (you can see my public dashboard):

select
  collection,
  count(collection) as num_transactions
from
  nft.trades
where
  blockchain = "ethereum"
  and collection <> ""
  and block_time >= '{{from_date}}'
group by 1
order by 2 desc

This returns a table like so:

collection                  | num_transactions
----------------------------------------------
GrandpaApes                 | 4666
Ethereum Name Service (ENS) | 4545
OpenSea Collection          | 2319

As you can see from the dashboard, this correctly returns NFT trades grouped by collection in descending order of the number of transactions.

What I'm trying to do is to add a column that contains a running cumulative total of num_transactions - let's call this tot_transactions. The result I want is something like this:

collection                  | num_transactions | tot_transactions
-----------------------------------------------------------------
GrandpaApes                 | 4666             | 4666
Ethereum Name Service (ENS) | 4545             | 9211
OpenSea Collection          | 2319             | 11530

I tried to do this by amending the query as below but it throws an error; I've read a bunch of the other SQL questions on SO but so far I haven't been able to get any of them to work. How should I be writing my query to make this succeed in Dune?

select
  collection,
  count(collection) as num_transactions,
  sum(count(collection)) as tot_transactions
from
  nft.trades
where
  blockchain = "{{blockchain}}"
  and collection <> ""
  and block_time >= '{{from_date}}'
group by 1
order by 2 desc

Error message:

Error: {42000} [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.

Note: I can see that it's telling me I can't use an aggregate inside another aggregate (I assume this means I can't use count() inside sum()) but I tried using a subquery and I wasn't sure how to get that to work either (apologies - I no longer have the version of the query I wrote that has the subquery in it).

score 1 · Answer 1 · answered Oct 21 '22 at 05:30

1

Error message is coming from Apache Hive SQL. Looking at the documentation for Apache Hive SQL, they say they support windowing functions, so replace your sum(count(collection)) with

Sum(count(collection)) Over (Group By collection Order By count(collection) Rows Between Unbounded Preceding and Current Row)

Yes, it's quite verbose.

If that does not work then wrap this query inside a second query where you use this in a windowing sum

answered Oct 21 '22 at 05:30

Chris Maurer

2,339
1
9
8

It would be `PARTITION BY collection` as first term after `OVER` – beeb Aug 30 '23 at 08:58

Dune SQL column with running totals and group by

1 Answers1