how to count number of fields for a single column in a row separated by 2 field separators (":" and ",")?

Question

given the text file:
(the structure is: "group_name:pw:group_id:user1<,user2>...")

adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel

how can i count the number of users for every line? or for a single line?

for example, if i want to know how many users contains the "adm1Group", the output should be 3, because adm1Group has three users (adm2, adm1 and adm3). another example, the first line (group name "adm"), contains two users, syslog and adm1.

the main problem is that there are two field separators here, so how can i separate the $4 column inside the same awk command? i have this solution by me but here i use two different awk commands linked with a pipe, like this (and i don't know if this is correct or "legal" for the kernel):

awk -F: '/adm1Group/ {print $4}' file.txt | awk -F, 'BEGIN {printf "N. of users in adm1Group = "} {print NF}'

can i achieve a solution like this in a single awk command? if not, can i use this? or this solution is "bad practice"?

This StackOverflow question might be very useful for you: https://stackoverflow.com/questions/12204192/using-multiple-delimiters-in-awk — Dominique, Jun 13 '22 at 11:42

score 2 · Answer 1 · answered Jun 13 '22 at 11:35

how can i count the number of users for every line? or for a single line?

I would use GNU AWK to count number of , inside 4th field and increase it by 1, let file.txt content be

adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel

then

awk 'BEGIN{FS=":"}{printf "N of users in %s is %s\n", $1, gsub(/,/,"",$4)+1}' file.txt

gives output

N of users in adm is 2
N of users in admins is 3
N of users in ssl-cert is 1
N of users in ala2 is 2
N of users in conda is 9
N of users in adm1Group is 3
N of users in docker is 1

Explanation: I inform GNU AWK that field separator (FS) is :. For each line I do use printf which acts like fill template and print and for filling I use 1st field ($1) and number of changes gsub function done when ordered to replace , using empty string ("") at 4th field ($4) increased by 1 (as last name has not trailing ,). Note that this does alter $4 (delete , characters) but for this task said side effect is irrelevant. Note that when using printf you need to provide newline character (\n) implicitly, as opposed to print.

(tested in gawk 4.2.1)

a clean solution to use gsub! However, you might want to replace it with `,` instead of `""` if the user wants to reuse `$4` again in its original form. — kvantour, Jun 13 '22 at 13:27
yes this was my first thought as solution, but I have encountered a problem with strings that have no user, i.e. the case where this command should output zero. (I know it may be impossible since each user is assigned to a group, but theoretically speaking it is possible with other non-user data), the output would always be 1 in that case, because of the +1, right? or am I wrong ? — sirducas, Jun 13 '22 at 14:12
@sirducas yes for empty column number given would be `1`, but this is might be compensated using ternary operator, for example `{n=gsub(/,/,"",$4);printf "N of users in %s is %s\n", $1, n==0?0:n+1}` — Daweo, Jun 13 '22 at 14:32

anubhava · Answer 2 · 2022-06-13T11:48:02.613

1

You can use split for this:

awk -F: '$1 == "adm1Group" {print split($NF, a, /,/)}' file
3


awk -F: '$1 == "conda" {print split($NF, a, /,/)}' file
9

Or to print all of them together:

awk -F: '{print split($NF, a, /,/), "no of users in adm1Group:", $1}' file

2 no of users in adm1Group: adm
3 no of users in adm1Group: admins
1 no of users in adm1Group: ssl-cert
2 no of users in adm1Group: ala2
9 no of users in adm1Group: conda
3 no of users in adm1Group: adm1Group
1 no of users in adm1Group: docker

edited Jun 13 '22 at 11:48

answered Jun 13 '22 at 11:15

anubhava

761,203
64
569
643

conda has 9 users, right? i'm just looking for the split built-in function because i never use that – sirducas Jun 13 '22 at 11:31
Yes of course `9`. It was a typo earlier. If it worked out please consider accepting the answer. – anubhava Jun 13 '22 at 11:40

RavinderSingh13 · Accepted Answer · 2022-06-13T11:33:56.497

1

With your shown samples and attempts please try following awk code. This will print total number of users present in each group name for your Input_file.

awk -F':' '
{
  num=0
  arr1[$1]=num=split($NF,arr2,",")
}
END{
  for(i in arr1){
    print "Group " i " has " arr1[i] " users."
  }
}
' Input_file

Explanation: Adding detailed explanation for above code.

awk -F':' '                          ##Starting awk program where setting field separator as : here.
{
  num=0                              ##Setting num as 0 here.
  arr1[$1]=num=split($NF,arr2,",")   ##Creating arr1 array with index of $1 and has value of num, which contains total number of total elements in arr2 with delimiter of , here.
}
END{                                 ##Starting END block of this program from here.
  for(i in arr1){                    ##Traversing through arr1 here.
    print "Group " i " has " arr1[i] " users."  ##printing group name and its value(how many times users came for that group).
  }
}
' Input_file                         ##Mentioning Input_file name here.

edited Jun 13 '22 at 11:33

answered Jun 13 '22 at 11:18

RavinderSingh13

130,504
14
57
93

yes, this works perfectly, but could you please explain just the second line of the 1st awk block? there is like a "double" assignment (arr=num=split(..)), can we simplify that line maybe in two separated lines ? – sirducas Jun 13 '22 at 11:29
1

@sirducas, yeah I am going to add detailed explanation in a min or so. – RavinderSingh13 Jun 13 '22 at 11:30
@sirducas, I have added detailed explanation for my code above, let me know in case of any queries. – RavinderSingh13 Jun 13 '22 at 11:34
so, the split function always return an integer, or it returns an integer only in this scenario because we are using $NF? – sirducas Jun 13 '22 at 15:05
@sirducas, so this is how it works: `split($NF,arr2,",")` function splits last column($NF) into array arr2 where delimiter in $NF is `,` and `num` is total number of elements of array of arr2 OR in other language you can say all elements values of users separated by `,` in last field(num contains that), which(num) is further assigned to value of arr1. Let me know in case of any queries cheers. – RavinderSingh13 Jun 13 '22 at 15:08

score 1 · Answer 4 · answered Jun 13 '22 at 12:38

1

Use : or , as the field separator, then print the number of fields minus the 3 leading ones:

awk -F'[:,]' '{print $1, NF - 3}' file

awk -F'[:,]' -v group=conda '$1 == group {print NF - 3}' file

answered Jun 13 '22 at 12:38

glenn jackman

238,783
38
220
352

i like this solution and its working, however this command is restricted to this file text, with this particular structure, am i right? for example, if i don't know the exact number of fields between the name group and the user list, is this command still working? – sirducas Jun 13 '22 at 14:23
You're correct, the hardcoded `3` is a problem for variable file formats. If you know that the comma-separated field is the last one, then use one of the `split` solutions given. – glenn jackman Jun 13 '22 at 16:40

score 0 · Answer 5 · answered Jun 14 '22 at 23:47

0

{m,g}awk '$!NF=sprintf("%20s\47s user(s) count = %\0478.f",$!_,NF-!_)' FS=':.+:|,'

          adm's  user(s) count =         2
       admins's  user(s) count =         3
     ssl-cert's  user(s) count =         1
         ala2's  user(s) count =         2
        conda's  user(s) count =         9
    adm1Group's  user(s) count =         3
       docker's  user(s) count =         1

And with a tiny bit of modification, now the full list of users will be available at the tail. specifically, the tiny bolded item - now it's overwriting $1 instead of $0 ::

{m,g}awk ' $!_ = sprintf("%15s\47s user(s) count = %\0476.f",$!_,NF-!_)' FS=':.+:|,'

        adm's user(s) count =      2 syslog adm1
     admins's user(s) count =      3 adm2 adm12 manuel
   ssl-cert's user(s) count =      1 postgres
       ala2's user(s) count =      2 aceto salvemini
      conda's user(s) count =      9 giovannelli galise aceto caputo haymele salvemini scala adm2 adm12
  adm1Group's user(s) count =      3 adm2 adm1 adm3
     docker's user(s) count =      1 manuel

answered Jun 14 '22 at 23:47

RARE Kpop Manifesto

2,453
3
11

Why do you write `!NF` instead of `0` and `!_` instead of `1`? – jarno Dec 05 '22 at 09:09
@jarno : cuz `mawk`s sometimes act up when assign a number only to `$0`, while assigning to `$1 = $0 = …` is overly verbose - `$!NF` is the way to circumvent that issue. – RARE Kpop Manifesto Dec 05 '22 at 13:10
Oh, I guess it is a bug in `mawk`, then. – jarno Dec 05 '22 at 14:16
@jarno : actually all have this, actually, by design : `echo 'abc' | gawk '$_ = 0'` prints absolutely nothing, because the "pattern" evaluated to false thanks to that 0. However, `echo 'abc' | gawk '$_ = "0"'` prints the `"0"` as desired since it's a non-empty `ASCII` string that happens to contain the digit "0" instead of being a number. – RARE Kpop Manifesto Dec 06 '22 at 09:18
@jarno : the safest way to guarantee printing regardless of what's being assigned, is to take the 0th-power of the entire expression - it could be positive INFinity, negative NaN, or a string of 5 emojis, doesn't matter - 0-th power in `awk` always yields a 1 – RARE Kpop Manifesto Dec 06 '22 at 09:22
Similarly `echo 'abc' | gawk '$0 = 0'` prints nothing and `echo 'abc' | gawk '$0 = "0"'` prints "0" so I do not see the point in the obscure way of telling the index. – jarno Dec 06 '22 at 10:49
@jarno : it's merely a style preference of mine that's all. you're free to pick whatever works for u – RARE Kpop Manifesto Dec 06 '22 at 11:07

how to count number of fields for a single column in a row separated by 2 field separators (":" and ",")?

5 Answers5