I'm having troubles making running totals work for me in BigQuery.
I've found an example that works here: BigQuery SQL running totals
SELECT word, word_count, SUM(word_count) OVER(ORDER BY word DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a' LIMIT 30
But what I really want to do - is calculate number of most popular words that cover 80% of total word_count. So I tried to calculate running total while ordering by word_count first:
SELECT word, word_count, SUM(word_count) OVER(ORDER BY word_count DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus = 'hamlet'
AND word > 'a' LIMIT 30
But I get this:
Row word word_count f0_
1 o'er 18 18
2 answer 13 31
3 meet 8 39
4 told 5 44
5 treason 4 **52**
6 quality 4 **52**
7 brave 3 55
The running total is not increased from line 5 to line 6. Probably because in both cases word_count is 4.
What am I doing wrong?
Maybe there is a better way? My plan was to calculate running total. Then divide it by sum(word_count) OVER() and filter only rows with less than 80%. Then count number of these rows.