How can I count the frequency of duplicate data?

Question

I need help identifying how to count the frequency of duplicate information in a file. For example:

Here I would like to have a UNIX command to tell me how many times I had a number repeated 2 times and tell me how many times I had a number repeated more than 2 times within a file.

For example, this command would use the above data and yield an output that tells me there were 2 unique numbers repeated 2 times in the file (0 and 14 each two times in the data set) and 1 unique number that was repeated more than 2 times in the file (10 occurred more than two times in the data set).

Are the numbers one per line or several, and are there only numbers in the file? — Wintermute, Jan 13 '15 at 16:20
Please check the answer in this link http://stackoverflow.com/questions/6712437/find-duplicate-lines-in-a-file-and-count-how-many-time-each-line-was-duplicated Thanks, Anand — Anand Gangadhara, Jan 13 '15 at 16:34
Anand - This is not what I am looking for. Thank you though. I need an output similar to what this command would do: awk '{a[$0]++}END{for(x in a)b[a[x]]++;for(x in b)print b[x], x}' filename — bjb125, Jan 13 '15 at 16:39
The above command would yield the following output: 2 2 and 1 3 Meaning that there were 2 instances where a number was repeated twice and 1 instance where a number was repeated 3 times. I want the output to only show the number of times a number was repeated 2 times. I then want another command to only show the number of times a number was repeated more than 2 times. — bjb125, Jan 13 '15 at 16:44
edit your question to show the ACTUAL output you would expect given that specific input file. Don't just try to describe it in comments. — Ed Morton, Jan 13 '15 at 19:22

score 1 · Answer 1 · answered Jan 13 '15 at 17:08

If you just want to know there were 2 numbers that appeared twice and 1 number that appeared thrice:

sort file | uniq -c | awk '{print $1}' | sort | uniq -c

  2 2
  1 3

If you want to know what the numbers are, I'd use perl:

perl -lne '
        $n{$_}++
    } END {
        push @{$aggregate{$n{$_}}}, $_ for keys %n; 
        $,="\t"; 
        print $_, scalar(@{$aggregate{$_}}), join(",",@{$aggregate{$_}}) for keys %aggregate
' file

outputs

3   1   10
2   2   0,14

Ed Morton · Answer 2 · 2015-01-13T19:38:31.677

$ cat tst.awk
{ cnt[$0]++ }
END {
    for (key in cnt)
        hits[cnt[key]]++

    for (c in hits)
        print hits[c], c
}
$
$ awk -f tst.awk file
2 2
1 3

ad if you want to know which values are associated with which counts:

$ cat tst.awk
{ cnt[$0]++ }
END {
    for (key in cnt) {
        c = cnt[key]
        hits[c]++
        vals[c] = (c in vals ? vals[c] "," : "") key
    }

    for (c in hits)
        print hits[c], c, vals[c]
}
$
$ awk -f tst.awk file
2 2 0,14
1 3 10

How can I count the frequency of duplicate data?

2 Answers2