0

how do i modify this awk-script? It is modifying every occurrence but should modify every occurrence except in one column. The big problem is, that the specific column is not always the first one. But i know the Name of the column in the header.

awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  print
  next
}
{
  for(i=1;i<=NF;i++){
    sub(/^\/Text[0-9]+Text/,"",$i)
    sub(/Text.*/,"",$i)
  }
}
1
'  Input_file

Explanation: Adding a detailed level of explanation of above code:

awk '
BEGIN{                                 ##Starting BEGIN section of code here.
  FS=OFS=","                           ##Setting FS and OFS to comma here.
}
FNR==1{                                ##Checking condition if FNR==1 then do following.
  print                                ##Printing the current line here.
  next                                 ##next will skip all further statements from here.
}
{
  for(i=1;i<=NF;i++){                  ##Starting a for loop to traverse into all fields here.
    sub(/^\/Text[0-9]+Text/,"",$i)     ##Substituting from starting Text digits Text with NULL in current field.
    sub(/Text.*/,"",$i)                ##Substituting everything from Text to till last of field value with NULL in current field.
  }
}
1                                      ##1 will print edited/non-edited line here.
'  Input_file                          ##Mentioning Input_file name here.

Examplefile:

header1, header2, header3-dont-modify-this-column, header4, header5
,,/Text2234Text7846641Text.html,/Text2234Text7846641Text.html,/Text2234Text823241Text.html
,,/Text2234Text7846642Text.html,/Text2234Text7846642Text.html,/Text2234Text823242Text.html
,,/Text2234Text7846643Text.html,/Text2234Text7846643Text.html,/Text2234Text823243Text.html

Result should be:

header1, header2, header3-dont-modify-this-column, header4, header5
,,/Text2234Text7846641Text.html,7846641,823241
,,/Text2234Text7846642Text.html,7846642,823242
,,/Text2234Text7846643Text.html,7846643,823243

Thank you

Inian
  • 80,270
  • 14
  • 142
  • 161

2 Answers2

2

It is possible to write a code-golf version to do exactly the action you require. Nonetheless, I'll write something more generic which is easier to maintain. The idea is to keep track of the order of the headers in an array h. Example h[2] contains the header value of the second column. Furthermore, we will use an associative array v which is indexed by the header value. By modifying the values of array v, you can reconstruct the CSV based on v[h[i]].

We set the field separator to be simply a <comma>. If comma's can be part of a field due to quotes, have a look at: What's the most robust way to efficiently parse CSV using awk?

awk 'BEGIN{FS=OFS=","}
     (FNR==1) { for(i=1;i<=NF;++i) h[i]=$i; print; next }
     { for(i=1;i<=NF;++i) v[h[i]]=$i }
     { #perform modifications here based on v["header_name"]="new val" }
     { for(i=1;i<=NF;++i) printf v[h[i]] (i==NF?ORS:OFS) }' file

Example: I want to modify the columns with header "h2" and "h3", give them the value 0

Input:

h3,h1,h2,h4
1,2,3,4
5,6,7,8

Used AWK:

awk 'BEGIN{FS=OFS=","}
     (FNR==1) { for(i=1;i<=NF;++i) h[i]=$i; print; next }
     { for(i=1;i<=NF;++i) v[h[i]]=$i }
     {  v["h2"]=v["h3"]=0 }
     { for(i=1;i<=NF;++i) printf v[h[i]] (i==NF?ORS:OFS) }' file

Output:

h3,h1,h2,h4
0,2,0,4
0,6,0,8
kvantour
  • 25,269
  • 4
  • 47
  • 72
1

Could you please try following. As per your shown sample output OP don't want to substitution for 3rd field so I neglected it.

awk '
BEGIN{
  FS=OFS=","
}
FNR==1{
  print
  next
}
{
  for(i=1;i<=NF;i++){
    if(i!=3){
      sub(/^\/Text[0-9]+Text/,"",$i)
      sub(/Text.*/,"",$i)
    }
  }
}
1
'  Input_file


OR use a variable approach, where creating a variable for awk code and mention field number there which you want to ignore for substitution.

awk -v ignore_field="3" '
BEGIN{
  FS=OFS=","
}
FNR==1{
  print
  next
}
{
  for(i=1;i<=NF;i++){
    if(i!=ignore_field){
      sub(/^\/Text[0-9]+Text/,"",$i)
      sub(/Text.*/,"",$i)
    }
  }
}
1
'  Input_file

Adding a detailed explanation for above code here:

awk -v ignore_field="3" '                  ##Starting awk program and mentioning variable name ignore_field as 3, which will have value field to be ignored.
BEGIN{                                     ##Starting BEGIN section from here.
  FS=OFS=","                               ##Setting FS and OFS comma here.
}
FNR==1{                                    ##Checking condition if line is first line then do following.
  print                                    ##Printing current line here.
  next                                     ##next will skip all further statements from here.
}
{
  for(i=1;i<=NF;i++){                      ##Starting a for loop which starts from i=1 to till value of NF in current line.
    if(i!=ignore_field){                   ##Checking condition if current field is NOT equal to variable ignore_field then do following.
      sub(/^\/Text[0-9]+Text/,"",$i)       ##Substituting starting Text digits then Text string with NULL in current field.
      sub(/Text.*/,"",$i)                  ##Substituting string Text till end with NULL in current field.
    }
  }
}
1                                          ##Mentioning 1 will print edited/non-edited line.
' Input_file                               ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93