0

original line in file sed.txt:

outer_string_PATTERN_string(PATTERN_And_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string

only need to replace PATTERN to pattern which in brackets, not lowercase, it could replace to other word.

expect result:

outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

I could use ([^)]*) pattern to find the substring which would be replace some worlds in. But I can't use this pattern to index the substring's position, and it will replace the whole line's PATTERN to pattern.

:/tmp$ sed 's/([^)]*)/---/g' sed.txt 
outer_string_PATTERN_string---PATTERN_outer_string---_outer_string

:/tmp$ sed '/([^)]*)/s/PATTERN/pattern/g' sed.txt 
outer_string_pattern_string(pattern_And_pattern_pattern_i)pattern_outer_string(i_pattern_inner)_outer_string

I also tried to use the regex group in sed to capture and replace the words, but I can't figure out the command.

Can sed implement that? And how to achieve that? THX.

Zoe
  • 27,060
  • 21
  • 118
  • 148
Victor Lee
  • 2,467
  • 3
  • 19
  • 37
  • I don't understand why there are someone voted down this question, so weird. I solved it by myself and in the correctly method. And thanks for https://stackoverflow.com/questions/1251999/how-can-i-replace-a-newline-n-using-sed – Victor Lee Sep 17 '21 at 13:55
  • https://stackoverflow.com/help/self-answer – Zoe Sep 17 '21 at 13:55

4 Answers4

0

As an alternative, it is easier to do this in gnu awk with RS that matches (...) substring:

awk -v RS='\\([^)]+)' '{gsub(/PATTERN/, "pattern", RT); ORS=RT} 1' file

outer_string_PATTERN_string(pattern_i_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string

Steps:

  • RS='\\([^)]+)' captures a (...) string as record separator
  • gsub function then replaces PATTERN with pattern in matched text i.e. RT
  • ORS=RT sets ORS as the new modified RT
  • 1 prints each record to stdout

Another alternative solution using lookahead assertion in a perl regex:

perl -pe 's/PATTERN(?=[^()]*\))/pattern/g' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • thx, there are some other methods like awk to solve this question, I just try to use sed. – Victor Lee Sep 18 '21 at 06:59
  • You can use sed but sed is not suitable for problems like this. Just see how simple, maintainable and efficient are these 2 solutions as compared to sed – anubhava Sep 18 '21 at 07:48
0

Can sed implement that?

Yes. But you do not want to do it in sed. Use other programming language, like Python, Perl, or awk.

how to achieve that?

Implementing non-greedy regex is not simple in sed. Basically, generally, it consists of:

  • taking chunk of the input
  • process the chunk
  • put it in hold space
  • shuffle hold with pattern space - extract what been already processed, what's not
  • repeat
  • shuffle with hold space
  • output

Anyway, the following script:

#!/bin/bash
sed <<<'outer_string_PATTERN_string(PATTERN_i_PATTERN_PATTERN_i)PATTERN_outer_string(i_PATTERN_inner)_outer_string' '
    :loop;
    /\([^(]*\)\(([^)]*)\)\(.*\)/{
        # Lowercase the second part.
        s//\1\L\2\E\n\3/;
        # Mix with hold space.
        G;
        s/\(.*\)\n\(.*\)\n\(.*\)/\3\1\n\2/;
        # Put processed stuff into hold spcae
        h; s/\n.*//; x;
        # Process the other stuff again.
        s/.*\n//;
        bloop;
    };
    # Is hold space empty?
    x; /^$/!{
        # Pattern space has trailing stuff - add it.
        G; s/\n//;
        # We will print it.
        h;
        # Clear hold space
        s/.*//
    };x;
'

outputs:

PATTERN_outer_string(i_pattern_inner)outer_string_PATTERN_string(pattern_i_pattern_pattern_i)_outer_string
KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • THX, I need some time to digest this. Yes, maybe I do not want to do it in `sed` if this is the only one way to achieve that. – Victor Lee Sep 17 '21 at 09:31
  • Is this only suit to the example string ? I have update the question, maybe there is something you didn't consider? – Victor Lee Sep 17 '21 at 09:50
  • Sure I didn't. And still, the principle will be the same, just a bit more tokenization and hold space shuffeling. Instead of `s//\1\L\2\E\n\3/;` there has to be - first put `\([^(]*\)\)` to hold space, then remember `\(.*\)` also in hold space, replace `PATTERN` to `pattern` on pattern space, restore `\(.*\)` and conitnue. – KamilCuk Sep 17 '21 at 10:16
  • emm,,, I think this is hardcoded, and it doesn't general and wasn't fixed the question. – Victor Lee Sep 17 '21 at 10:47
0

Solved by this:

:/tmp$ sed 's/(/\n(/g' sed.txt | sed 's/)/)\n/g' | sed '/([^)]*)/s/PATTERN/pattern/g' | sed ':a;N;$!ba;s/\n//g'
outer_string_PATTERN_string(pattern_And_pattern_pattern_i)PATTERN_outer_string(i_pattern_inner)_outer_string
  • make pattern () in a new line
  • find the () lines and replace the PATTERN to pattern
  • merge multiple lines in one line

thanks for How can I replace a newline (\n) using sed?

Victor Lee
  • 2,467
  • 3
  • 19
  • 37
0

Can sed implement that?

It can be done using GNU sed and basic regular expressions (BRE):

sed '
s/)/)\n/g
:1
s/\(([^)]*\)PATTERN\([^)]*)\n\)/\1pattern\2/
t1
s/\n//g
' < file

where

  • 1st s inserts a newline after each )
  • 2nd s replaces the last (* is greedy) PATTERN inside ()s with pattern
  • t loops back if a substitution was made
  • 3rd s strips all inserted newlines

EDIT

2nd substitute command edited according to OP's suggestion since there is no need to match \n inside ().

urznow
  • 1,576
  • 1
  • 4
  • 13