Awk/Unix group by

Question

have this text file:

name, age
joe,42
jim,20
bob,15
mike,24
mike,15
mike,54
bob,21

Trying to get this (count):

joe 1
jim 1
bob 2
mike 3

Thanks,

score 113 · Accepted Answer · edited Apr 26 '19 at 21:53

113

$ awk -F, 'NR>1{arr[$1]++}END{for (a in arr) print a, arr[a]}' file.txt
joe 1
jim 1
mike 3
bob 2

EXPLANATIONS

-F, splits on ,
NR>1 treat lines after line 1
arr[$1]++ increment array arr (split with ,) with first column as key
END{} block is executed at the end of processing the file
for (a in arr) iterating over arr with a key
print a print key , arr[a] array with a key

edited Apr 26 '19 at 21:53

ahmet alp balkan

42,679
38
138
214

answered Feb 17 '13 at 00:50

Gilles Quénot

173,512
41
224
223

6

+1 for a one line awk answer (which was the tag in the question)! I love learning here... – Floris Feb 17 '13 at 00:53
Any comment why "mike" is printed before "bob", when the first occurrence of "bob" is before "mike" in the file?... – Floris Feb 17 '13 at 00:55
Arrays are arbitrarily sorted in `awk`. So, the output order is not guaranteed. – nneonneo Feb 17 '13 at 01:03
1

I see now, NR skips the 1st line, everything after END runs only once. thx! – C B Feb 17 '13 at 01:10
2

A small modification allows you to SUM the ages instead of just counting records: `awk -F, 'NR>1{arr[$1]+=$2}END{for (a in arr) print a, arr[a]}' file.txt'` – Dave Sep 20 '15 at 00:10

score 30 · Answer 2 · answered Feb 17 '13 at 00:44

30

Strip the header row, drop the age field, group the same names together (sort), count identical runs, output in desired format.

tail -n +2 txt.txt | cut -d',' -f 1 | sort | uniq -c | awk '{ print $2, $1 }'

output

bob 2
jim 1
joe 1
mike 3

answered Feb 17 '13 at 00:44

nneonneo

171,345
36
312
383

+1 for for fast and compact answer! I was only halfway through... And you give it in alphabetical order (wasn't asked...) – Floris Feb 17 '13 at 00:46
We'll see how OP wants it sorted, if at all. (To sort by the count, stick a `sort -n` before the `awk`). – nneonneo Feb 17 '13 at 00:47

score 10 · Answer 3 · answered Feb 17 '13 at 01:50

It looks like you want sorted output. You could simply pipe or print into sort -nk 2:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) print i, a[i] | "sort -nk 2" }' file

Results:

jim 1
joe 1
bob 2
mike 3

However, if you have GNU awk installed, you can perform the sorting without coreutils. Here's the single process solution that will sort the array by it's values. The solution should still be quite quick. Run like:

awk -f script.awk file

Contents of script.awk:

BEGIN {
    FS=","
}

NR>1 {
    a[$1]++
}

END {
    for (i in a) {
        b[a[i],i] = i
    }

    n = asorti(b)

    for (i=1;i<=n;i++) {
        split (b[i], c, SUBSEP)
        d[++x] = c[2]
    }

    for (j=1;j<=n;j++) {
        print d[j], a[d[j]]
    }
}

Results:

jim 1
joe 1
bob 2
mike 3

Alternatively, here's the one-liner:

awk -F, 'NR>1 { a[$1]++ } END { for (i in a) b[a[i],i] = i; n = asorti(b); for (i=1;i<=n;i++) { split (b[i], c, SUBSEP); d[++x] = c[2] } for (j=1;j<=n;j++) print d[j], a[d[j]] }' file

score 4 · Answer 4 · answered Feb 17 '13 at 00:49

4

A strictly awk solution...

BEGIN { FS = "," }
{ ++x[$1] }
END { for(i in x) print i, x[i] }

If name, age is really in the file, you could adjust the awk program to ignore it...

BEGIN   { FS = "," }
/[0-9]/ { ++x[$1] }
END     { for(i in x) print i, x[i] }

answered Feb 17 '13 at 00:49

DigitalRoss

143,651
25
248
329

1

Liking the use of the /[0-9]/ address to work only with lines with an age in it... – Floris Feb 17 '13 at 00:56

score 0 · Answer 5 · answered Jun 29 '20 at 08:54

I come up with two functions based on the answers here:

topcpu() {
    top -b -n1                                                                                  \
        | tail -n +8                                                                            \
        | awk '{ print $12, $9, $10 }'                                                          \
        | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
        | sort -k3 -n                                                                           \
        | tail -n 10                                                                            \
        | column -t                                                                             \
        | tac
}

topmem() {
    top -b -n1                                                                                  \
        | tail -n +8                                                                            \
        | awk '{ print $12, $9, $10 }'                                                          \
        | awk '{ CPU[$1] += $2; MEM[$1] += $3 } END { for (k in CPU) print k, CPU[k], MEM[k] }' \
        | sort -k2 -n                                                                           \
        | tail -n 10                                                                            \
        | column -t                                                                             \
        | tac
}

$ topcpu
chrome           0    75.6
gnome-shell      6.2  7
mysqld           0    4.2
zsh              0    2.2
deluge-gtk       0    2.1
Xorg             0    1.6
scrcpy           0    1.6
gnome-session-b  0    0.8
systemd-journal  0    0.7
ibus-x11         6.2  0.7

$ topmem
top              12.5  0
Xorg             6.2   1.6
ibus-x11         6.2   0.7
gnome-shell      6.2   7
chrome           6.2   74.6
adb              6.2   0.1
zsh              0     2.2
xdg-permission-  0     0.2
xdg-document-po  0     0.1
xdg-desktop-por  0     0.4

enjoy!

score 0 · Answer 6 · edited Apr 10 '22 at 18:29

0

cut -d',' -f 1 file.txt |
sort | uniq -c

2 bob
1 jim
1 joe
3 mike

edited Apr 10 '22 at 18:29

tripleee

175,061
34
275
318

answered Nov 25 '20 at 17:14

Ajay Ahuja

1,196
11
26

Awk/Unix group by

6 Answers6

EXPLANATIONS

Linked

Related