97

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

For e.g. if the output of cmd1 is:

100 
100 
100 
99 
99 
26 
25 
24 
24

I need another command that I can pipe the above output to, so that, I get:

100     3
99      2
26      1
25      1
24      2
sanmai
  • 29,083
  • 12
  • 64
  • 76
letronje
  • 9,002
  • 9
  • 45
  • 53

7 Answers7

116

how about;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",$2,$1)}END{print}'

The result is :

100 3
99  2
26  1
25  1
24  2
diguage
  • 387
  • 4
  • 19
  • 1
    I ran this and it produced an extra print statement of $1,$2 at the end: `100 3 99 2 26 1 25 1 24 2 2 24` – Mittenchops Mar 25 '13 at 16:46
  • 3
    The following adds a new line between the results and removes the extra line at the end: `echo "100 100 100 99 99 26 25 24 24" | tr " " "\n" | sort | uniq -c | sort -k2nr | awk '{printf("%s\t%s\n",$2,$1)}END{print}' | head -n -1` so you get: `100 3 99 2 26 1 25 1 24 2 ` – Woody May 27 '16 at 16:24
  • Note about syntax, you can end a line with a pipe instead of using a backslash. – wjandrea Jul 14 '19 at 03:03
68

uniq -c works for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

Ibrahim
  • 1,883
  • 19
  • 27
11

if order is not important

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
  • +1 for doing this with 3 less pipes. It would be awesome if you could elaborate on how this works b/c it confused me. ;-) Thanks. – SaxDaddy Oct 27 '14 at 02:47
10

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
   | sort -nr | uniq -c | awk '{printf "%-8s%s\n", $2, $1}'
100     3
99      2
26      1
25      1
24      2
ericcurtin
  • 1,499
  • 17
  • 20
2

In Bash, we can use an associative array to count instances of each input value. Assuming we have the command $cmd1, e.g.

#!/bin/bash

cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable a using the ++ mathematical operator on the relevant array entries:

while read i
do
    ((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

for i in "${!a[@]}"
do
    echo "$i ${a[$i]}"
done

If the order of output is important, we might need an external sort of the keys:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
    echo "$i ${a[$i]}"
done
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
1

In case you have input stored in my_file you can do:

sort -nr my_file | uniq -c | awk ' { t = $1; $1 = $2; $2 = t; print; } '

Otherwise just pipe the input to be processed to the same cmd.

Explanation:

  • sort -nr sorts the input numerically (-n) in reverse order (-r)
  • uniq -c count duplicates and shows the count side-by-side
  • awk '{ t = $1; $1 = $2; $2 = t; print; }' swaps the two columns
rkachach
  • 16,517
  • 6
  • 42
  • 66
0

Ruby internally has tools to do this very efficiently from the command line.

Example, given this file:

$ cat file
100 
100 
100 
99 
99 
26 
25 
24 
24
1
  1. Count each;
  2. Sort by a) decreasing occurrence b) decreasing value;
  3. Put in lined up columns.

This Ruby does that:

ruby  -e '
cnt=Hash.new(0)
$<.each{|x| cnt[x.to_i]+=1}
w1,w2=cnt.max_by{|e| e.to_s.length}.map{|e| e.to_s.length+2}
cnt.sort_by{|k,v| [-v,-k]}.each{|k,v| 
            puts "#{k.to_s.rjust(w1," ")}\t#{v.to_s.rjust(w2," ")}"
}' file

Prints:

  100     3
   99     2
   24     2
   26     1
   25     1
    1     1

The input file does not need to be sorted.

dawg
  • 98,345
  • 23
  • 131
  • 206