How do I sum all numbers from output of jq

Question

I have this command that I would like to sum all the numbers from the output.

The command looks like this

$(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')

So it's going to list all the folders in /reports/dt=2018-08-27 and get only _stats.json and pass that through jq from hadoop -cat and get only .duration from the json. Which in the end I get the result like this.

1211789 1211789 373585 495379 1211789

But I would like the command to sum all those numbers together to become 4504331

I suspect that you're running `echo $val` rather than `echo "$val"` in printing your result -- otherwise, I'd expect newlines instead of spaces between the values, as `jq` output is newline-separated unless explicit action is taken to change this behavior (but an `echo` with an unquoted argument *is* such specific action, as described in [BashPitfalls #14](http://mywiki.wooledge.org/BashPitfalls#echo_.24foo)). — Charles Duffy, Aug 28 '18 at 20:32
@glennjackman, ...hmm. Almost a shame this was tagged as a bash question (since this *is* duplicative in bash) rather than a jq question (since there's a distinct and useful answer specific to that toolchain). — Charles Duffy, Aug 28 '18 at 21:21

törzsmókus · Answer 1 · 2022-07-01T15:14:48.807

51

the simplest solution is the add filter:

jq '[.duration] | add'

the [ brackets ] are needed around the value to sum because add sums the values of an array, not a stream. (for stream summation, you would need a more sophisticated solution, e.g. using reduce, as detailed in other answers.)

depending on the exact format of the input, you may need some preprocessing to get this right.

e.g. for the sample input in Charles Duffy’s answer either

use inputs (note that -n is needed to avoid jq swallowing the first line of input):
```
jq -n '[inputs.duration] | add' <<< "$sample_data"
```

or slurp (-s) and iterate (.[]) / map:

jq -s '[.[].duration] | add' <<< "$sample_data"
jq -s 'map(.duration) | add' <<< "$sample_data"

edited Jul 01 '22 at 15:14

answered Mar 20 '20 at 09:56

törzsmókus

1,799
2
21
28

1

For me `jq '[.duration] | add' <<< "$sample_data" ` does not produce the sum of values but the outputs the list of numbers. – Romain Oct 24 '20 at 04:02
@Romain I assume you came from [Charles Duffy’s answer](https://stackoverflow.com/a/52065738/501765); I am going to update mine with caveats for that kind of input – törzsmókus Oct 28 '20 at 11:14
1

Just a sidenote: Giving an empty array to `add` produces `null`, not `0`, which might mess up following operations. – Marki Dec 06 '20 at 12:41
I would write the last variant as `jq -s 'map(.duration) | add' <<< "$sample_data"` which is a bit more intuitive – Qrilka Jul 01 '22 at 07:04

score 17 · Accepted Answer · answered Aug 28 '18 at 20:27

17

Another option (and one that works even if not all your durations are integers) is to make your jq code do the work:

sample_data='{"duration": 1211789}
{"duration": 1211789}
{"duration": 373585}
{"duration": 495379}
{"duration": 1211789}'

jq -n '[inputs | .duration] | reduce .[] as $num (0; .+$num)' <<<"$sample_data"

...properly emits as output:

Replace the <<<"$sample_data" with a pipeline on stdin as desired.

answered Aug 28 '18 at 20:27

Charles Duffy

280,126
43
390
441

3

nice but a bit overcomplicated, see [my answer](https://stackoverflow.com/a/60771918/501765) – törzsmókus Aug 04 '20 at 11:15

karakfa · Answer 3 · 2018-08-28T20:34:29.663

14

awk to the rescue!

$ ... | awk '{sum+=$0} END{print sum}'

4504331

edited Aug 28 '18 at 20:34

answered Aug 28 '18 at 20:31

karakfa

66,216
7
41
56

score 12 · Answer 4 · answered Oct 04 '19 at 09:32

12

You can just use add now.

jq '.duration | add'

answered Oct 04 '19 at 09:32

Timmmm

88,195
71
364
509

Did not work for me :( It throws the following error: jq: error (at :0): Cannot iterate over number (362425) – Jerald Sabu M Dec 18 '19 at 15:50
2

@JeraldSabuM wrap your values with an array – törzsmókus Mar 20 '20 at 09:55

peak · Answer 5 · 2018-08-29T20:55:20.570

3

For clarity and generality, it might be worthwhile defining sigma(s) to add a stream of numbers:

... | jq -n '
  def sigma(s): reduce s as $x(0;.+$x); 
  sigma(inputs | .duration)'

edited Aug 29 '18 at 20:55

answered Aug 29 '18 at 01:53

peak

105,803
17
152
177

5

While this might answer the authors question, it lacks some explaining words and links to documentation. Raw code snippets are not very helpful without some phrases around it. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please edit your answer. – hellow Aug 29 '18 at 06:59

score 0 · Answer 6 · answered Oct 24 '20 at 04:12

0

From a combination of other answers.

$ jq -n '[inputs | .duration] | add' <<< "$sample_data"

# 4504331

I had to format the values in an array [inputs | .duration] before summing values with add.

answered Oct 24 '20 at 04:12

Romain

19,910
6
56
65

score -1 · Answer 7 · answered Aug 28 '18 at 20:22

-1

Use a for loop.

total=0
for num in $(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')
do
    ((total += num))
done
echo $total

answered Aug 28 '18 at 20:22

Barmar

741,623
53
500
612

1

You really shouldn't write loops like that: https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29 Note that most of that pipeline is redundant as well – DTSCode Aug 28 '18 at 21:06
@DTSCode Those pitfalls are all about filenames, which could contain spaces. This pipeline only produces numbers, there shouldn't be a problem. I agree that the pipeline could be improved, but it wasn't necessary to address the main question. – Barmar Aug 29 '18 at 15:35
I'm well aware of what the pitfall is for. Like I said previously, my point is that you shouldn't write for loops in that manner. There is always a better solution, bash or otherwise. – DTSCode Aug 29 '18 at 18:46
2

Why shouldn't I write a for loop like this when I know I'm just looping over numbers? There's no need to write this using `while read -r num ...` – Barmar Aug 29 '18 at 19:13
And other than combining `grep` and `awk`, I'm not sure what's redundant. Note that I'm not familiar with the `hadoop` command, so there could be ways to simplify those parts that I don't know. – Barmar Aug 29 '18 at 19:23

How do I sum all numbers from output of jq

7 Answers7