1

This question is related to this other question I asked earlier today: Find and replace text with all-inclusive wild card

I have a text file like this

I want= to keep this
        This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
        <this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone> 
       A novice programmer walked into a "BAR2" descript keepthis
        and this even more text, let's keep it
    <I actually want this>
    and this= too.`

when I use sed -f script.sed file.txt to run this script:

# Check for "aff"
/\baff\b/    {   
# Define a label "a"
:a  
# If the line does not contain "desc"
/\bdesc\b/!{
# Get the next line of input and append
    # it to the pattern buffer
    N
    # Branch back to label "a"
    ba
}   
# Replace everything between aff and desc
s/\(\baff\)\b.*\b\(desc\b\)/\1TEST DATA\2/
}

I get this as my output:

       I want= to keep this
        This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
        <this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone> 
       A novice programmer walked into a "BAR2" descript keepthis
        and this even more text, let's keep it
    <I actually want this>
    and this= too.

However, by simply changing the search strings from aff and desc to FOO1 and BAR2:

   # Check for "FOO1"
/\bFOO1\b/    {   
# Define a label "a"
:a  
# If the line does not contain "BAR2"
/\bBAR2\b/!{
# Get the next line of input and append
    # it to the pattern buffer
    N
    # Branch back to label "a"
    ba
}   
# Replace everything between FOO1 and BAR2
s/\(\bFOO1\)\b.*\b\(BAR2\b\)/\1TEST DATA\2/
}

gives the expected output:

I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1TEST DATABAR2" descript keepthis
    and this even more text, let's keep it
<I actually want this>
and this= too.`

I am completely stumped about what is going on here. Why should searching between FOO1 and BAR2 work differently from the exact same script with aff and desc?

Community
  • 1
  • 1
Stonecraft
  • 860
  • 1
  • 12
  • 30

1 Answers1

1

The end marker should be \bdesc instead of \bdesc\b.

Note the \b in the pattern, it matches a word boundary. Your above text contains the word description, but not desc.

Your previous question made me assume that you want that. If you don't care about word boundaries, remove the \b escape sequences completely.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • OK, that takes care of that.. and words are defined by white spaces, not the the presence of non-word characters? I'm pretty sure I tried matching the whole words the first few times I tried it, but if something like `thistext="thattext"` would be treated as one word (and `thistext` would not be found), then I think that is where I went wrong (although also without understanding what the `\b`'s were doing anyway). – Stonecraft Aug 24 '15 at 08:11
  • 1
    Indeed, sometimes word boundaries don't fit to parse data were meaningful terms are concatenated by *word* characters. – hek2mgl Aug 24 '15 at 08:35