How to remove some words in specific field using awk?

Question

I have several lines of text. I want to extract the number after specific word using awk.

I tried the following code but it does not work.

At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).

Index  AllocTres                              CPUTotal
1      cpu=1,mem=256G                         18
2      cpu=2,mem=1024M                        16
3                                             4
4      cpu=12,gres/gpu=3                      12
5                                             8
6                                             9
7      cpu=13,gres/gpu=4,gres/gpu:ret6000=2   20
8      mem=12G,gres/gpu=3,gres/gpu:1080ti=1   21

Please note there are several empty fields in this file. what I want to achieve only keep the number folloing the first gres/gpu part and remove all cpu= and mem= parts using a pipeline like: cat test.text | awk '{some_commands}' to output 3 columns:

Index  AllocTres                              CPUTotal
1                                             18
2                                             16
3                                             4
4      3                                      12
5                                             8
6                                             9
7      4                                      20
8      3                                      21

RavinderSingh13 · Accepted Answer · 2022-06-11T09:40:14.373

1st solution: With your shown samples, please try following GNU awk code. This takes care of spaces in between fields.

awk '
FNR==1{ print; next }
match($0,/[[:space:]]+/){
  space=substr($0,RSTART,RLENGTH-1)
}
{
  match($2,/gres\/gpu=([0-9]+)/,arr)
  match($0,/^[^[:space:]]+[[:space:]]+[^[:space:]]+([[:space:]]+)/,arr1)
  space1=sprintf("%"length($2)-length(arr[1])"s",OFS)
  if(NF>2){ sub(OFS,"",arr1[1]);$2=space arr[1] space1 arr1[1] }
}
1
'   Input_file

Output will be as follows for above code with shown samples:

Index  AllocTres                              CPUTotal
1                                             18
2                                             16
3                                             4
4      3                                      12
5                                             8
6                                             9
7      4                                      20
8      3                                      21

2nd solution: If you don't care of spaces then try following awk code.

awk 'FNR==1{print;next} match($2,/gres\/gpu=([0-9]+)/,arr){$2=arr[1]} 1' Input_file

Explanation: Adding detailed explanation for above code.

awk '             ##Starting awk program from here.
FNR==1{           ##Checking condition if this is first line then do following.
  print           ##Printing current line.
  next            ##next will skip all further statements from here.
}
match($2,/gres\/gpu=([0-9]+)/,arr){  ##using match function to match regex gres/gpu= digits and keeping digits in capturing group.
  $2=arr[1]       ##Assigning 1st value of array arr to 2nd field itself.
}
1                 ##printing current edited/non-edited line here.
' Input_file      ##Mentioning Input_file name here.

score 0 · Answer 2 · answered Jun 11 '22 at 09:47

Using sed

$ sed 's~\( \+\)[^,]*,\(gres/gpu=\([0-9]\)\|[^ ]*\)[^ ]* \+~\1\3 \t\t\t\t      ~' input_file
Index  AllocTres                              CPUTotal
1                                             18
2                                             16
3                                             4
4      3                                      12
5                                             8
6                                             9
7      4                                      20
8      3                                      21

potong · Answer 3 · 2022-06-12T10:30:15.827

This might work for you (GNU sed):

    sed -E '/=/!b
        s/\S+/\n&\n/2;h
        s/.*\n(.*)\n.*/\1/
        /gpu=/!{s/./ /g;G;s/(^.*)\n(.*)\n.*\n/\2\1/p;d}
        s/gpu=([^,]*)/\n\1    \n/;s/(.*)\n(.*\n)/\2\1/;H
        s/.*\n//;s/./ /g;H;g
        s/\n.*\n(.*)\n(.*)\n.*\n(.*)/\2\3\1/' file

In essence the solution above involves using the hold space (see here and eventually here) as a scratchpad to hold intermediate results. Those results are gathered by isolating the second field and then again the gpu info. The step by step story follows:

If the line does not contain a second field, leave alone.

Surround the second field by newlines and make a copy.

Isolate the second field

If the second field contains no gpu info, replace the entire field by spaces and using the copy, format the line accordingly.

Otherwise, isolate the gpu info, move it to the front of the line and append that to the copy of the line in the hold space.

Meanwhile, remove the gpu info from the pattern space and replace each character in the pattern space by a space.

Apend these spaces to the copy and then overwrite the pattern space by the copy.

Lastly, knowing each part of the line has been split by newlines, reassemble the parts into the desired format.

N.B. The solution depends on the spacing of columns being real spaces. If there are tabs in the file, then prepend the sed command s/\t/ /g (where in the example tabs are replaced by 8 spaces).

Alternative:

sed -E '/=/!b
        s/\S+/\n&\n/2;h
        s/.*(\n.*)\n.*/\1/;s/(.)(.*gpu=)([^,]+)/\3\1\2/;H
        s/.*\n//;s/./ /g;G
        s/(.*)\n(.*)\n.*\n(.*)\n(.*)\n.*$/\2\4\1\3/' file

In this solution, rather than treat lines with a second field but no gpu info, as a separate case, I introduce a place holder for this missing info and follow the same solution as if gpu info was present.

score 0 · Answer 4 · answered Jun 11 '22 at 14:52

awk '
FNR>1 && NF==3 {
    n = split($2, a, ",")
    for (i=1; a[i] !~ /gres\/gpu=[0-9]+,?/ && i<=n; ++i);
    sub(/.*=/, "", a[i])
    $2 = a[i]
}
NF==2 {$3=$2; $2=""}
{printf "%-7s%-11s%s\n",$1,$2,$3}' test.txt

Output:

Index  AllocTres  CPUTotal
1                 18
2                 16
3                 4
4      3          12
5                 8
6                 9
7      4          20
8      3          21

You can adjust column widths as desired.

This assumes the first and last columns always have a value, so that NF (number of fields) can be used to identify field 2. Then if field 2 is not empty, split that field on commas, scan the resulting array for the first match of gres/gpu, remove this suffix, and print the three fields. If field 2 is empty, the second last line inserts an empty awk field so printf always works.

If assumption above is wrong, it's also possible to identify field 2 by its character index.

RARE Kpop Manifesto · Answer 5 · 2022-06-12T06:33:10.087

A awk-based solution without needing

- array        splitting, 
- regex back-referencing,
- prior   state tracking, or 
- input    multi-passing 
  —- since m.p. for /dev/stdin would require state tracking

|

{mng}awk '!_~NF || sub("[^ ]+$", sprintf("%*s&", length-length($!(NF=NF)),_))' \
             FS='[ ][^ \\/]*gres[/]gpu[=]|[,: ][^= ]+[=][^,: ]+' OFS=

Index  AllocTres                              CPUTotal
1                                             18
2                                             16
3                                             4
4     3                                       12
5                                             8
6                                             9
7     4                                       20
8     3                                       21

If you don't care for nawk, then it's even simpler single-pass approach with only 1 all-encompassing call to sub() per line :

awk ' sub("[^ ]*$", sprintf("%*s&", length($_) - length($(\
     gsub(" [^ /]*gres[/]gpu=|[,: ][^= ]+=[^,: ]+", _)*_)),_))'

or even more condensed but worse syntax styling :

awk 'sub("[^ ]*$",sprintf("%*s&",length^gsub(" [^ /]*gres\/gpu=|"\
                          "[,: ][^= ]+=[^,: ]+",_)^_ - length,_) )'

How to remove some words in specific field using awk?

5 Answers5