Awk - Count Each Unique Value and Match Values Between 2 Files

Question

I have two files. I am trying to get the count of each unique field in column 8 in file 1, and then match the unique field value from the 6th column of the 2nd file.

So essentially, I am trying to -> take each unique value and value count from column 8 from File1, if there is a match in column6 of file2

File1:

2020-12-23 23:59:12,235911688,\N,34,20201223233739,797495497,404,819,\N,
2020-12-23 23:59:12,235911419,\N,34,265105814,718185263,200,819,\N,
2020-12-23 23:59:12,235912029,\N,34,20201223233739,748362773,404,819,\N,
2020-12-23 23:59:12,235911839,\N,34,20201223233738,745662697,404,400,\N,
2020-12-23 23:59:12,235911839,\N,34,20201223233738,745662697,404,400,\N,
2020-12-24 23:59:12,235911839,\N,34,20201223233738,745662697,404,400,\N,

File2:

public static String status_code                             =   "819";
public static String DeActivate                                  = "400";

Expected output:

total count of status_code,819 : 3
total count of DeActivate,400 : 3

My code:

awk 'NR==FNR{a[$8]++}NR!=FNR{gsub(/"/,"",$6);b[$6]=$0}END{for( i in b){printf "Total count of %s,%d : %d\n",gensub(/^([^ ]+).*/,"\\1","1",b[i]),i,a[i]}}' File1 File2

Algorithm

1.Take the 8th feild from 1st file:(eg:819)

2.Count how time unique feild(819) occurs in file(based of date)

3 take the corresponding value of 819 from 4th feild of file2

4 print output together

I believe I should be able to do this with awk, but for some reason I am really struggling with this.

Sorry but this is NOT clear, please edit your question is with more clear samples by using CODE TAGS please and let us know then. Eg--> `\N` is really a literal character in your file? — RavinderSingh13, Dec 29 '20 at 07:34
yes just take the unique 8th unique field count from the first file and take the description of the 8th feild from the second file and print unique feild, feild description and feild count togethe @RavinderSigh13 — Nithin justin, Dec 29 '20 at 07:44
I understand, we all are here to learn. But if you don't give your exact samples(which are changing by your questions) then no one will be able to help, so its better to keep your samples clear. Like I given one example is `\N` character is really present in your samples? Please edit your samples better and use CODE TAGS to cover them(like how pii_ke did edit now), let us know once post is more clear. — RavinderSingh13, Dec 29 '20 at 07:50
_count of each unique field in column 8 **based on date at column 1**._ How does this affect the output? Currently I don't see it affecting it at all in the expected output. Will this lead to yet another question on the same subject? — James Brown, Dec 29 '20 at 11:10
I separate the contents of the file by their dates( i.e. if cdr has field1 with multiple dates like 2020-12-24, 2020-12-20 etc. move them in separate files)(just bcoz i am noob in coding)@james Brown — Nithin justin, Dec 29 '20 at 11:32
Please stop ending every question you post with the same sentence `I believe I should be able to do this with awk, but for some reason I am really struggling with this one.`. We know you think you should be able to do it with awk or you wouldn't be trying to do it with awk and tagging the question with awk and we know you're struggling with it or you wouldn't be asking for help, — Ed Morton, Dec 29 '20 at 16:12

pii_ke · Accepted Answer · 2020-12-29T13:21:13.650

0

(It is something like SQL JOINing two relational database tables on File1's $8 being equal to File2's $6.)

awk '
NR==FNR {    # For the first file
    a[$8]++; # count each $8
}
NF&&NR!=FNR {             # For non empty lines of file 2
    gsub(/[^0-9]/,"",$6); # remove non-digits from $6
    b[$6]=$4              # save name of constant to b
}
END{
    for(i in b){   # for constants occurring in File2
        if(a[i]) { # if File1 had non zero count
            printf( "Total count of %s,%d : %d\n",b[i],i,a[i]);
                   #print data
        }
    }
}' "FS=," File1 FS=" " File2

The above code works with your sample input. It produces the following output:

Total count of DeActivate,400 : 3
Total count of status_code,819 : 3

I think the main problem is that you do not specify comma as field separator for File1. See Processing two files with different field separators in awk

edited Dec 29 '20 at 13:21

answered Dec 29 '20 at 08:12

pii_ke

2,811
2
20
30

code works fine.but it print all values description from file2.just need the description of matching values from file1 ..is is possible to exclude items from the reuslt that have count 0.thanks in advance – Nithin justin Dec 29 '20 at 08:32
@Nithinjustin I've edited. I think it does not print output for count zero now. – pii_ke Dec 29 '20 at 08:39
Thanks a Alot! its works.if u dont mind can u explain the code..it will be helpful for everyone – Nithin justin Dec 29 '20 at 08:42
I'll add a few comments later. – pii_ke Dec 29 '20 at 09:54
That `"FS=,"` is a bit unusual way to set it seems to work. – James Brown Dec 29 '20 at 11:37

score 0 · Answer 2 · answered Dec 29 '20 at 15:30

A shorter, more efficient, way without the second array and for loop:

$ cat demo.awk
NR == FNR {
    a[$8]++
    next
}

{
    gsub(/[^0-9]/,"",$6)
    printf "Total count of %s,%d : %d\n", $4, $6, a[$6]
}

$ awk -f demo.awk FS="," file1 FS=" " file2
Total count of status_code,819 : 3
Total count of DeActivate,400 : 3
$

Awk - Count Each Unique Value and Match Values Between 2 Files

2 Answers2