0

I have a list with 2 columns.

host1 2
host2 33
host2 21
host1 1

I need to calculate the sum for column 2, and get the format like:

host1 3
host2 54

How should I do this? thx

James Brown
  • 36,089
  • 7
  • 43
  • 59

4 Answers4

2

Just use awk:

$ awk '{a[$1]+=$2}END{for(i in a)print i,a[i]}' file
host1 3
host2 54

Explained:

$ awk '{
    a[$1] += $2         # Group on column 1 key, and sum column 2 values.
}
END {                   # When all lines done:
    for(i in a)         #   For each key:
        print i, a[i]   #     Output key and sum.
}' file
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
James Brown
  • 36,089
  • 7
  • 43
  • 59
  • 1
    Don't know, I'm only responsible for the upvote :-) I was about 97.3% toward typing in the exact same `awk` script when your answer appeared, so gave up. But I may just format the readable bit a bit better if you don't mind (just spacing). – paxdiablo Mar 04 '21 at 07:45
  • OK, cool. Never thought of commenting that way. – James Brown Mar 04 '21 at 07:49
  • 2
    Hi @JamesBrown, I treasure awk (like you it seems) and your program is beautiful, but op asked for "Use bash" so I didn't think an awk program was a correct answer. Op has since accepted your answer, so it seems that I was wrong and withdrawing my downvote. – Allan Wind Mar 04 '21 at 09:14
2

bash implementation:

sum() {
    # read and accumulate input
    declare -A a
    while read k v
    do
        declare -i a["$k"]
        a["$k"]+=$v
    done

    # print accumulated result
    for k in "${!a[@]}"
    do
        echo "$k" ${a["$k"]}
    done
}

cat <<EOF | sum
host1 2
host2 33
host2 21
host1 1
EOF

which yield this output:

host1 3
host2 54
Allan Wind
  • 23,068
  • 5
  • 28
  • 38
  • Perhaps easier to read, if you would write the summing statement as `((a[k]+=v))` – user1934428 Mar 04 '21 at 08:38
  • @user1934428 yes, you are right, that is easier to read. Needed to declare the variable an integer though. – Allan Wind Mar 04 '21 at 08:46
  • Declaring was not necessary for me. Which bash version are you using? What error do you get, if the declaration is missing? – user1934428 Mar 04 '21 at 09:09
  • I think, your first declaration of `a` has a problem. I would just initialize the array with `a=()`. Your `-A` makes it an associative array, and this could lead to trouble later. After all, you also don't need to declara `v` as integer. – user1934428 Mar 04 '21 at 09:11
  • I need to declare it an associate array, otherwise it will be a one-dimensional indexed array and $k will be cast to an integer with value 0 and you cannot extract the keys in the 2nd loop. It doesn't matter if you use a=() or just assign to it with a["$k"]+=$v. And if I don't set the integer attribute on the array entries, then += does string concat. I tried a few variations and couldn't get your ((a[k]+=v)) to work. – Allan Wind Mar 04 '21 at 09:23
  • Ah, get the point. Of course you need separate sums for each key. Still, with my bash version (4.4.12), I don't need to declare it as integer. What kind of problem do you run into, if you drop the `declare -i a["$k"]`? – user1934428 Mar 04 '21 at 09:29
  • 1
    `a[k]=1; echo "${!a[@]}"; unset a; declare -A a; a[k]=1; echo "${!a[@]}"` gives me 0 and k with bash 5.0.3(1)-release – Allan Wind Mar 04 '21 at 09:34
  • I get _k_ printed twice instead for your example. Never see the `0` you get. I wonder what has changed in bash between these version to cause this effect. It's a bit creepy when upward compatibility is broken. – user1934428 Mar 04 '21 at 09:38
0

Lets say content is in file temp.txt

Solution: awk '{seen[$1]+=$2;}END{for(indx in seen) print indx" " seen[indx];}' temp.txt

Output:

host1 3
host2 54
Shubham Saroj
  • 290
  • 2
  • 12
-2

Here is how you can add numbers of second column:

numbers=`cat file | cut -f 2 -d " "`
sum=0
for i in $numbers
do
sum=$((sum+i))
done
echo $sum

Cut will get you numbers of second column. You can add these numbers with a for loop. After executing this script you will get

[user@host ~]$ ./shell_name.sh
57
Learner
  • 54
  • 10
  • 1
    Aside from the [useless use ot cat](https://stackoverflow.com/questions/11710552/useless-use-of-cat), your solution would fail for very large files (i.e. where the expansion in the for-loop excced the maximum allowed length). Since there are other possibilities to solve this (see in particular the answer by Allan Wind), I would not recommend your approach. – user1934428 Mar 04 '21 at 08:42
  • 1
    In addition, you are summing up **all** the values in column 2, which is not what the OP wants. – user1934428 Mar 04 '21 at 09:30