Bash: replacing pattern in specific column but only in lines between two patterns

Question

I have files with this kind of structure:

abc
def
ghi
...
x x y x x
x x z x x
x x y x x
...
JKL
x x y x x
x x z x x
x x y x x
...
...
*empty line*
mno
pqr
...
...

I would like to copy the whole file to a new file but with some changes. Fist, I want to affect only the lines between pattern JKL and the next empty line. On top of that, I need to replace every occurrence of the pattern y with a new pattern NEW, but only if it appears in the third column.

I tried using sed, but I got stuck at how to select columns:

sed -ne '/JKL/,/^$/s/y/NEW/'

this, of course, replaced y with NEW in all columns.

I also tried looking up awk, but I could only find examples of the two separate needs I have, and wasn't able to put them together. How could I do it?

score 2 · Answer 1 · answered Sep 04 '18 at 09:14

2

Third column is something that follows the beginning of a line, a sequence of non-spaces, a spaces, another sequence of non-spaces, and finally a space:

sed '/^JKL$/,/^$/s/^\([^ ][^ ]* [^ ][^ ]*\) y /\1 NEW /'

or, if your sed supports -r or -E:

sed -E '/^JKL$/,/^$/s/^([^ ]+ [^ ]+) y /\1 NEW /'

answered Sep 04 '18 at 09:14

choroba

231,213
25
204
289

My example file was too simplified, my bad. This works only when columns are separated by just one space, and only when the first column starts at the beginning of the line. However, I think I got the gist of it! – Lorenzo Gaifas Sep 04 '18 at 09:48
@LorenzoGaifas: It should be easy to replace a space by `[[:space:]]` and add pluses to match more than one occurrence. – choroba Sep 04 '18 at 11:16

Sundeep · Answer 2 · 2018-09-04T09:22:35.643

1

awk also allows the range syntax similar to sed, see How to select lines between two patterns? for alternate and more flexible ways

awk '/JKL/,/^$/{if($3=="y") $3="NEW"} 1' ip.txt

/JKL/,/^$/ lines of interest
- if($3=="y") if 3rd field value is exactly the string y
- $3="NEW" change the 3rd field value to desired text
- if you need use regex, use sub(/y/, "NEW", $3) or gsub(/y/, "NEW", $3)
1 idiomatic way to print contents of $0

edited Sep 04 '18 at 09:22

answered Sep 04 '18 at 09:15

Sundeep

23,246
2
28
103

This does what I wanted. However, I just realized, there is a problem: my columns have arbitrary numbers of spaces before and after (in order to align them properly), and this solution removes the additional spaces, somehow. – Lorenzo Gaifas Sep 04 '18 at 09:40
1

When you assign a field, Awk will reformat the line. You can work around this by performing a manual substitution, though it's a bit more complex. See https://stackoverflow.com/questions/20835437/how-to-preserve-the-original-whitespace-between-fields-in-awk – tripleee Sep 04 '18 at 09:55
1

@LorenzoGaifas You could choose to produce a tab-separated output in order to align columns. – simlev Sep 04 '18 at 10:02
@LorenzoGaifas in that case, a better sample representative of your real use case would have helped.. and it is not clear if you are replacing part of 3rd column or entire column and so on.. the sed solution in other answer is better suited, you can specify optional spaces at start of line, varying spaces between fields, etc – Sundeep Sep 04 '18 at 10:09
@Sundeep yes, I wanted to keep it general for whoever may need the same kind of solution. I think It's clear enough from the comments now though, thanks! – Lorenzo Gaifas Sep 04 '18 at 10:37

James Brown · Answer 3 · 2018-09-04T11:38:56.703

Using GNU awk and split(). First some more descriptive test data:

...
JKL
 x x y x x
    x  y  z  x  x

...

Then the script:

$ awk '
/JKL/,/^ *$/ {                 # the desired block 
    n=split($0,a,FS,seps)      # split and store the separators
    b=seps[0]                  # seps[0] has the leading space, init buffer with it
    for(i=1;i<=n;i++) {        # iterate all fields
        if(i==3 && a[i]=="y")  # if 3rd field is y
            a[i]="NEW"         # replace it with with NEW
        b=b a[i] seps[i]       # build the buffer for output
    }
    print b
}' file

and the output:

JKL
 x x NEW x x
    x  y  z  x  x

Bash: replacing pattern in specific column but only in lines between two patterns

3 Answers3