Multiline pattern replacement with sed in bash script

Question

Attempting to replace multiple lines with two characters },

if the following can be searched and replaced with the above, then the problem would be solved.

The pattern looks like this when encountered:

        },

      ]
    }
  }
}


{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [

These are the methods I've tried so far

    #using \ and spaces
combinedDSL=$(echo "$initialDSLs"|sed 's/}\
      ]\
    }\
  }\
}\
\
*\
{\
  "query": {\
    "bool": {\
      "minimum_should_match": 1,\
      "should": [\
/},/' )

echo "$combinedDSL"

#using line breaks \n 

combinedDSL2=$(echo "$initialDSLs|"sed N 's/}\n]\n}\n}\n}\n\n*\n{\n"query": {\n"bool": {\n"minimum_should_match": 1,\n"should": [\n/},/')
echo "$combinedDSL2"

here is the full context:

{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
    

        {
          "wildcard": {
            "author_place": "*city*"
          }
        },
        
        {
          "wildcard": {
            "meta.title.results": "*state*"
          }
        },

      ]
    }
  }
}


{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
    

            {
            "wildcard": {
                "author": "*cat*"
            }
            },
            
            {
            "wildcard": {
                "meta.hashtag.results": "*Monkey*"
            }
}
      ]
    }
  }
}

[edit] your question to show the target lines in-context surrounded by lines you do not want matched and also add the expected output given that input. — Ed Morton, Jun 30 '21 at 03:38
That looks like JSON, which means you need to use a JSON-aware tool like `jq` to work with it. — Shawn, Jun 30 '21 at 03:44
Yes it is json. @Shawn I am trying to replace those lines which would allow a smooth combination of multiple queries — stackmacstacky, Jun 30 '21 at 03:46
There is no part of the "context" you provided that matches the string you want to replace. There are different numbers of blanks lines between each and a comma in one that isn't present in the other, and a missing newline and different indentation. Please clean up your example so the string you want to find exists in your sample input. — Ed Morton, Jun 30 '21 at 04:03
@EdMorton thanks for letting me know. I have cleaned it up as much as possible while still retaining the context. As for the spacing, I included a wildcard trying to catch a number of newlines which may throw off the script ( would wildcards still be able to work?) Thanks for the help! — stackmacstacky, Jun 30 '21 at 04:10
with much work you can probably make a complex regexp that will do the job for your specific JSON-file. But in my view it's wrong tool for the project. It's better to learn how to use some of the many JSON tools out there, and be more flexible next time. — MyICQ, Jun 30 '21 at 07:05
@stackmacstacky if you want to include wildcards in the text you're searching for then that text has to be a regexp and then you have to find all other regexp metachars in that text and escape them to make them literal, you can't just replace a literal string with some other literal string. So it's a tradeoff. Which do you really want? — Ed Morton, Jun 30 '21 at 12:54
This is one of the reasons it's important not to use the word `pattern` when discussing matching text - when you say "pattern" it could mean anything but there are no "patterns" in text matching, there are only "regexps" and "strings" and they are very different from each other, so you have to decide if you want to a regexp or sting match otherwise you end up with a fragile mush thinking you have some mythical "pattern". See [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern). — Ed Morton, Jun 30 '21 at 13:45

score 1 · Answer 1 · answered Jun 30 '21 at 22:32

$ cat tst.sh
#!/usr/bin/env bash

old='        },

      ]
    }
  }
}


{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [
'

new='
},'

# https://stackoverflow.com/questions/29613304/is-it-possible-to-escape-regex-metacharacters-reliably-with-sed
# explains how in the script below we turn "old" above into a robust
# regexp that's forgiving of white space differences in the target file
# and deactivate the possible backreference in "new".

old="$old" new="$new" awk '
    BEGIN {
        old = ENVIRON["old"]
        new = ENVIRON["new"]

        # Deactivate possible regexp metachars in "old"
        gsub(/[^^\\[:space:]]/,"[&]",old) # deactivate all non-space chars except ^ and \
        gsub(/\^/,"\\^",old)            # deactivate ^
        gsub(/\\/,"\\\\",old)           # deactivate \

        # Make any literal white space present in "old" match all white space
        gsub(/[[:space:]]+/,"[[:space:]]+",old)

        # Deactivate possible backreference metachars in "new"
        gsub(/&/,"\\&",new)             # deactivate &
    }
    {
        # Create a single input record out of the whole input file
        rec = (NR>1 ? rec RS : "") $0
    }
    END {
        gsub(old,new,rec)
        print rec
    }
' "${@:--}"

$ ./tst.sh file
{
  "query": {
    "bool": {
      "minimum_should_match": 1,
      "should": [


        {
          "wildcard": {
            "author_place": "*city*"
          }
        },

        {
          "wildcard": {
            "meta.title.results": "*state*"
          }
},{
            "wildcard": {
                "author": "*cat*"
            }
            },

            {
            "wildcard": {
                "meta.hashtag.results": "*Monkey*"
            }
}
      ]
    }
  }
}

Multiline pattern replacement with sed in bash script

1 Answers1