29

I have this command that I would like to sum all the numbers from the output.

The command looks like this

$(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')

So it's going to list all the folders in /reports/dt=2018-08-27 and get only _stats.json and pass that through jq from hadoop -cat and get only .duration from the json. Which in the end I get the result like this.

1211789 1211789 373585 495379 1211789

But I would like the command to sum all those numbers together to become 4504331

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
toy
  • 11,711
  • 24
  • 93
  • 176
  • I suspect that you're running `echo $val` rather than `echo "$val"` in printing your result -- otherwise, I'd expect newlines instead of spaces between the values, as `jq` output is newline-separated unless explicit action is taken to change this behavior (but an `echo` with an unquoted argument *is* such specific action, as described in [BashPitfalls #14](http://mywiki.wooledge.org/BashPitfalls#echo_.24foo)). – Charles Duffy Aug 28 '18 at 20:32
  • @glennjackman, ...hmm. Almost a shame this was tagged as a bash question (since this *is* duplicative in bash) rather than a jq question (since there's a distinct and useful answer specific to that toolchain). – Charles Duffy Aug 28 '18 at 21:21
  • Agreed. I reopened and edited tags accordingly – glenn jackman Aug 28 '18 at 21:57

7 Answers7

51

the simplest solution is the add filter:

jq '[.duration] | add'

the [ brackets ] are needed around the value to sum because add sums the values of an array, not a stream. (for stream summation, you would need a more sophisticated solution, e.g. using reduce, as detailed in other answers.)


depending on the exact format of the input, you may need some preprocessing to get this right.

e.g. for the sample input in Charles Duffy’s answer either

  • use inputs (note that -n is needed to avoid jq swallowing the first line of input):

    jq -n '[inputs.duration] | add' <<< "$sample_data"
    
  • or slurp (-s) and iterate (.[]) / map:

    jq -s '[.[].duration] | add' <<< "$sample_data"
    jq -s 'map(.duration) | add' <<< "$sample_data"
    
törzsmókus
  • 1,799
  • 2
  • 21
  • 28
  • 1
    For me `jq '[.duration] | add' <<< "$sample_data" ` does not produce the sum of values but the outputs the list of numbers. – Romain Oct 24 '20 at 04:02
  • @Romain I assume you came from [Charles Duffy’s answer](https://stackoverflow.com/a/52065738/501765); I am going to update mine with caveats for that kind of input – törzsmókus Oct 28 '20 at 11:14
  • 1
    Just a sidenote: Giving an empty array to `add` produces `null`, not `0`, which might mess up following operations. – Marki Dec 06 '20 at 12:41
  • I would write the last variant as `jq -s 'map(.duration) | add' <<< "$sample_data"` which is a bit more intuitive – Qrilka Jul 01 '22 at 07:04
17

Another option (and one that works even if not all your durations are integers) is to make your jq code do the work:

sample_data='{"duration": 1211789}
{"duration": 1211789}
{"duration": 373585}
{"duration": 495379}
{"duration": 1211789}'

jq -n '[inputs | .duration] | reduce .[] as $num (0; .+$num)' <<<"$sample_data"

...properly emits as output:

4504331

Replace the <<<"$sample_data" with a pipeline on stdin as desired.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
14

awk to the rescue!

$ ... | awk '{sum+=$0} END{print sum}'

4504331
karakfa
  • 66,216
  • 7
  • 41
  • 56
12

You can just use add now.

jq '.duration | add'
Timmmm
  • 88,195
  • 71
  • 364
  • 509
3

For clarity and generality, it might be worthwhile defining sigma(s) to add a stream of numbers:

... | jq -n '
  def sigma(s): reduce s as $x(0;.+$x); 
  sigma(inputs | .duration)'
peak
  • 105,803
  • 17
  • 152
  • 177
  • 5
    While this might answer the authors question, it lacks some explaining words and links to documentation. Raw code snippets are not very helpful without some phrases around it. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please edit your answer. – hellow Aug 29 '18 at 06:59
0

From a combination of other answers.

$ jq -n '[inputs | .duration] | add' <<< "$sample_data"

# 4504331

I had to format the values in an array [inputs | .duration] before summing values with add.

Romain
  • 19,910
  • 6
  • 56
  • 65
-1

Use a for loop.

total=0
for num in $(hadoop fs -ls -R /reports/dt=2018-08-27 | grep _stats.json | awk '{print $NF}' | xargs hadoop fs -cat | jq '.duration')
do
    ((total += num))
done
echo $total
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    You really shouldn't write loops like that: https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29 Note that most of that pipeline is redundant as well – DTSCode Aug 28 '18 at 21:06
  • @DTSCode Those pitfalls are all about filenames, which could contain spaces. This pipeline only produces numbers, there shouldn't be a problem. I agree that the pipeline could be improved, but it wasn't necessary to address the main question. – Barmar Aug 29 '18 at 15:35
  • I'm well aware of what the pitfall is for. Like I said previously, my point is that you shouldn't write for loops in that manner. There is always a better solution, bash or otherwise. – DTSCode Aug 29 '18 at 18:46
  • 2
    Why shouldn't I write a for loop like this when I know I'm just looping over numbers? There's no need to write this using `while read -r num ...` – Barmar Aug 29 '18 at 19:13
  • And other than combining `grep` and `awk`, I'm not sure what's redundant. Note that I'm not familiar with the `hadoop` command, so there could be ways to simplify those parts that I don't know. – Barmar Aug 29 '18 at 19:23