awk count number of occurrences of each character in entire file

Question

I have a text file which I would like to count the number of occurrences of each character in the file

Below is an example of what my file look like

#1=DBD?BFHH=FIIIHIIGIHGHHIIIIIIIIGG?CHIIIAGGGHIGHEEHB@BDBCEDDDDD@CCA>?A>@C>:<?CCDDDDD@CD@DCBD9?CCDCB@
#1=DDFFFHFDHHIIIIJJIGHJIJGIIIIEGHGHJJBFGFHEIEEG@FFHJ.=EHHHABDDDBCCECEEEEDCBDEDDDDDDDDCDD?B9B:A:@?CCCD

So the output would be:

E - 10, C - 20, (#) - 10, 3 - 9
etc etc...

I hope I was clear enough in what I want.

Thanks!

I am somewhat new at awk and have spent quite sometime reading up on it and searching for some solutions to what I was looking for. There is was more to it but I got the first half on my own and the question I posted was the second half. — Sinan, Dec 07 '14 at 18:40

Gilles Quénot · Accepted Answer · 2014-12-07T19:09:58.940

1

$ awk '{for (i=1; i<=NF; i++){a[$i]++}}END{for (i in a){print i, a[i]}}' FS= file
A 5
B 13
C 20
D 36
E 14
9 2
F 10
: 3
G 14
. 1
H 21
< 1
I 29
J 7
= 4
# 2
> 3
1 2
? 7
@ 8

edited Dec 07 '14 at 19:09

answered Dec 06 '14 at 22:26

Gilles Quénot

173,512
41
224
223

1

IMO, it's somewhat cleaner to put the `FS=""` in a `BEGIN` block. It avoids dealing with shell details. – D.Shawley Dec 07 '14 at 20:46
@D.Shawley you would be in the minority. If you are writing an awk script it makes sense, but on the command line all it does it add 8 characters to do the same thing. – Zombo Dec 07 '14 at 22:46
@StevenPenny fair enough. I also add newlines when I write commands on the command line so my "one liner" awk command lines tend to span multiple lines ;) – D.Shawley Dec 08 '14 at 01:42

score 1 · Answer 2 · answered Dec 07 '14 at 00:56

If you need count the letter on all lines:

sed 's/\(.\)/\1\n/g' infile|sort |uniq -c |sort -n

      1 .
      1 <
      2
      2 #
      2 1
      2 9
      3 :
      3 >

If you need count the letter on each line:

awk -v FS="" '{delete a;for (i=1;i<=NF;i++) a[$i]++;for (i in a) printf "%s - %s, ",i,a[i];printf RS}' infile

A - 3, B - 7, C - 12, D - 17, E - 3, 9 - 1, F - 2, : - 1, G - 8, H - 10, < - 1, I - 18, = - 2, # - 1, > - 3, 1 - 1, ? - 5, @ - 6,
A - 2, B - 6, C - 8, D - 19, E - 11, 9 - 1, F - 8, : - 2, G - 6, . - 1, H - 11, I - 11, J - 7, = - 2, # - 1, 1 - 1, ? - 2, @ - 2,

Let's see. sometime you have to guess the real request. – BMW Dec 07 '14 at 09:19 — BMW, Dec 07 '14 at 09:19

score 0 · Answer 3 · answered Dec 07 '14 at 01:10

Perl is very good for this kind of thing. Read the file as a single string, remove newlines, count the letters, output the results sorted by letter.

perl -0777 -nE 's/\n//g; $c{$_}++ for split //; say "$_ $c{$_}" for sort keys %c' file

# 2
. 1
1 2
9 2
: 3
< 1
= 4
> 3
? 7
@ 8
A 5
B 13
C 20
D 36
E 14
F 10
G 14
H 21
I 29
J 7

score 0 · Answer 4 · edited May 23 '17 at 12:34

0

GNU awk 4.1

awk -iwalkarray '{for (;NF;NF--) b[$NF]++} END {walk_array(b)}' FS=

[A] = 5
[B] = 13
[C] = 20
[D] = 36
[E] = 14
[F] = 10
[9] = 2
[G] = 14
[:] = 3
[.] = 1
[H] = 21
[I] = 29
[<] = 1
[J] = 7
[#] = 2
[=] = 4
[1] = 2
[>] = 3
[?] = 7
[@] = 8

If you have earlier version of GNU awk you can use for (c in b) print c, b[c]. I noticed that walk_array had never been used on Stack Overflow so I did it for fun. I found my awk files at /usr/share/awk and /usr/lib/gawk

awk save modifications in place

edited May 23 '17 at 12:34

Community

1
1

answered Dec 07 '14 at 06:46

Zombo

1
62
391
407

2

I was just to ask about that. I do not find any documentation about the `walk_array` in the `gnu awk` manual. Can you point me in correct direction? Like to learn :) – Jotne Dec 07 '14 at 09:33
Can you try this and see if it works: `{while(--NF) z[$NF]++}`? – Jotne Dec 07 '14 at 09:45
@Jotne that cannot work because you will lose the last character on each line – Zombo Dec 07 '14 at 09:48

awk count number of occurrences of each character in entire file

4 Answers4