2

given the text file:
(the structure is: "group_name:pw:group_id:user1<,user2>...")

adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel

how can i count the number of users for every line? or for a single line?

for example, if i want to know how many users contains the "adm1Group", the output should be 3, because adm1Group has three users (adm2, adm1 and adm3). another example, the first line (group name "adm"), contains two users, syslog and adm1.

the main problem is that there are two field separators here, so how can i separate the $4 column inside the same awk command? i have this solution by me but here i use two different awk commands linked with a pipe, like this (and i don't know if this is correct or "legal" for the kernel):

awk -F: '/adm1Group/ {print $4}' file.txt | awk -F, 'BEGIN {printf "N. of users in adm1Group = "} {print NF}'

can i achieve a solution like this in a single awk command? if not, can i use this? or this solution is "bad practice"?

sirducas
  • 45
  • 6
  • This StackOverflow question might be very useful for you: https://stackoverflow.com/questions/12204192/using-multiple-delimiters-in-awk – Dominique Jun 13 '22 at 11:42

5 Answers5

2

how can i count the number of users for every line? or for a single line?

I would use GNU AWK to count number of , inside 4th field and increase it by 1, let file.txt content be

adm:x:4:syslog,adm1
admins:x:1006:adm2,adm12,manuel
ssl-cert:x:122:postgres
ala2:x:1009:aceto,salvemini
conda:x:1011:giovannelli,galise,aceto,caputo,haymele,salvemini,scala,adm2,adm12
adm1Group:x:1022:adm2,adm1,adm3
docker:x:998:manuel

then

awk 'BEGIN{FS=":"}{printf "N of users in %s is %s\n", $1, gsub(/,/,"",$4)+1}' file.txt

gives output

N of users in adm is 2
N of users in admins is 3
N of users in ssl-cert is 1
N of users in ala2 is 2
N of users in conda is 9
N of users in adm1Group is 3
N of users in docker is 1

Explanation: I inform GNU AWK that field separator (FS) is :. For each line I do use printf which acts like fill template and print and for filling I use 1st field ($1) and number of changes gsub function done when ordered to replace , using empty string ("") at 4th field ($4) increased by 1 (as last name has not trailing ,). Note that this does alter $4 (delete , characters) but for this task said side effect is irrelevant. Note that when using printf you need to provide newline character (\n) implicitly, as opposed to print.

(tested in gawk 4.2.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • 1
    a clean solution to use gsub! However, you might want to replace it with `,` instead of `""` if the user wants to reuse `$4` again in its original form. – kvantour Jun 13 '22 at 13:27
  • yes this was my first thought as solution, but I have encountered a problem with strings that have no user, i.e. the case where this command should output zero. (I know it may be impossible since each user is assigned to a group, but theoretically speaking it is possible with other non-user data), the output would always be 1 in that case, because of the +1, right? or am I wrong ? – sirducas Jun 13 '22 at 14:12
  • 1
    @sirducas yes for empty column number given would be `1`, but this is might be compensated using ternary operator, for example `{n=gsub(/,/,"",$4);printf "N of users in %s is %s\n", $1, n==0?0:n+1}` – Daweo Jun 13 '22 at 14:32
1

You can use split for this:

awk -F: '$1 == "adm1Group" {print split($NF, a, /,/)}' file
3


awk -F: '$1 == "conda" {print split($NF, a, /,/)}' file
9

Or to print all of them together:

awk -F: '{print split($NF, a, /,/), "no of users in adm1Group:", $1}' file

2 no of users in adm1Group: adm
3 no of users in adm1Group: admins
1 no of users in adm1Group: ssl-cert
2 no of users in adm1Group: ala2
9 no of users in adm1Group: conda
3 no of users in adm1Group: adm1Group
1 no of users in adm1Group: docker
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

With your shown samples and attempts please try following awk code. This will print total number of users present in each group name for your Input_file.

awk -F':' '
{
  num=0
  arr1[$1]=num=split($NF,arr2,",")
}
END{
  for(i in arr1){
    print "Group " i " has " arr1[i] " users."
  }
}
' Input_file

Explanation: Adding detailed explanation for above code.

awk -F':' '                          ##Starting awk program where setting field separator as : here.
{
  num=0                              ##Setting num as 0 here.
  arr1[$1]=num=split($NF,arr2,",")   ##Creating arr1 array with index of $1 and has value of num, which contains total number of total elements in arr2 with delimiter of , here.
}
END{                                 ##Starting END block of this program from here.
  for(i in arr1){                    ##Traversing through arr1 here.
    print "Group " i " has " arr1[i] " users."  ##printing group name and its value(how many times users came for that group).
  }
}
' Input_file                         ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • yes, this works perfectly, but could you please explain just the second line of the 1st awk block? there is like a "double" assignment (arr=num=split(..)), can we simplify that line maybe in two separated lines ? – sirducas Jun 13 '22 at 11:29
  • 1
    @sirducas, yeah I am going to add detailed explanation in a min or so. – RavinderSingh13 Jun 13 '22 at 11:30
  • @sirducas, I have added detailed explanation for my code above, let me know in case of any queries. – RavinderSingh13 Jun 13 '22 at 11:34
  • so, the split function always return an integer, or it returns an integer only in this scenario because we are using $NF? – sirducas Jun 13 '22 at 15:05
  • @sirducas, so this is how it works: `split($NF,arr2,",")` function splits last column($NF) into array arr2 where delimiter in $NF is `,` and `num` is total number of elements of array of arr2 OR in other language you can say all elements values of users separated by `,` in last field(num contains that), which(num) is further assigned to value of arr1. Let me know in case of any queries cheers. – RavinderSingh13 Jun 13 '22 at 15:08
1

Use : or , as the field separator, then print the number of fields minus the 3 leading ones:

awk -F'[:,]' '{print $1, NF - 3}' file
awk -F'[:,]' -v group=conda '$1 == group {print NF - 3}' file
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • i like this solution and its working, however this command is restricted to this file text, with this particular structure, am i right? for example, if i don't know the exact number of fields between the name group and the user list, is this command still working? – sirducas Jun 13 '22 at 14:23
  • You're correct, the hardcoded `3` is a problem for variable file formats. If you know that the comma-separated field is the last one, then use one of the `split` solutions given. – glenn jackman Jun 13 '22 at 16:40
0
{m,g}awk '$!NF=sprintf("%20s\47s user(s) count = %\0478.f",$!_,NF-!_)' FS=':.+:|,'

          adm's  user(s) count =         2
       admins's  user(s) count =         3
     ssl-cert's  user(s) count =         1
         ala2's  user(s) count =         2
        conda's  user(s) count =         9
    adm1Group's  user(s) count =         3
       docker's  user(s) count =         1

And with a tiny bit of modification, now the full list of users will be available at the tail. specifically, the tiny bolded item - now it's overwriting $1 instead of $0 ::

{m,g}awk ' $!_ = sprintf("%15s\47s user(s) count = %\0476.f",$!_,NF-!_)' FS=':.+:|,'

        adm's user(s) count =      2 syslog adm1
     admins's user(s) count =      3 adm2 adm12 manuel
   ssl-cert's user(s) count =      1 postgres
       ala2's user(s) count =      2 aceto salvemini
      conda's user(s) count =      9 giovannelli galise aceto caputo haymele salvemini scala adm2 adm12
  adm1Group's user(s) count =      3 adm2 adm1 adm3
     docker's user(s) count =      1 manuel
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
  • Why do you write `!NF` instead of `0` and `!_` instead of `1`? – jarno Dec 05 '22 at 09:09
  • @jarno : cuz `mawk`s sometimes act up when assign a number only to `$0`, while assigning to `$1 = $0 = …` is overly verbose - `$!NF` is the way to circumvent that issue. – RARE Kpop Manifesto Dec 05 '22 at 13:10
  • Oh, I guess it is a bug in `mawk`, then. – jarno Dec 05 '22 at 14:16
  • @jarno : actually all have this, actually, by design : `echo 'abc' | gawk '$_ = 0'` prints absolutely nothing, because the "pattern" evaluated to false thanks to that 0. However, `echo 'abc' | gawk '$_ = "0"'` prints the `"0"` as desired since it's a non-empty `ASCII` string that happens to contain the digit "0" instead of being a number. – RARE Kpop Manifesto Dec 06 '22 at 09:18
  • @jarno : the safest way to guarantee printing regardless of what's being assigned, is to take the 0th-power of the entire expression - it could be positive INFinity, negative NaN, or a string of 5 emojis, doesn't matter - 0-th power in `awk` always yields a 1 – RARE Kpop Manifesto Dec 06 '22 at 09:22
  • Similarly `echo 'abc' | gawk '$0 = 0'` prints nothing and `echo 'abc' | gawk '$0 = "0"'` prints "0" so I do not see the point in the obscure way of telling the index. – jarno Dec 06 '22 at 10:49
  • @jarno : it's merely a style preference of mine that's all. you're free to pick whatever works for u – RARE Kpop Manifesto Dec 06 '22 at 11:07