1

I am looking to remove "error_mail" and "succeed_mail" nodes from multiple similar XML files using sed or awk utilities .

Using sed , I was trying with below command ..but its not working

sed -i /<action name="succeed_mail">/,/<\/action>/d *.xml

Here is the sample file (test.xml) looks as below:-

Input XML File :- test.xml

 <workflow>
    <action name="start"
    -----
    -----
       </action>
    
    <action name="error_mail">
            <email xmlns="uri:oozie:email-action:0.1">
              <to>abc@xyz.com</to>
              <cc>abc@xyz.com</cc>
              <subject>Batch Failed</subject>
              <body>Batch Failed at ${node}</body>
            </email>
            <ok to="killjob"/>
            <error to="killjob"/>
          </action>
        <action name="succeed_mail">
            <email xmlns="uri:oozie:email-action:0.1">
              <to>abc@xyz.com</to>
              <cc>abc@xyz.com</cc>
              <subject>Batch Succeed</subject>
              <body>Batch completed</body>
            </email>
            <ok to="end"/>
            <error to="end"/>
          </action></r>
    </workflow>

--------Desired output :-

test.xml
<workflow>
<action name="start"
-----
-----
   </action>
</workflow>
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
pravek
  • 21
  • 4
  • 1
    Always enclose scripts in quotes: `sed 'foo'`, not `sed foo`. Also, "its not working" is the worst possible problem statement - tell us in what way it's not working (wrong output, no output, error messages, etc.) so we can best help you with the problem you have rather than possibly some other problem we **think** you might have. – Ed Morton Dec 11 '20 at 16:09

3 Answers3

0

Experts always advice to use tools like xmlstarlet to parse xml files, since OP is using sed so coming up with this awk solution. Fair warning this is written as per shown samples ONLY, in case you have something different this may not work.

awk '
/^ +<\/action>/ && foundSuccess{
  foundSuccess=""
  next
}
/^ +<\/action>/ && foundError{
  foundError=""
  next
}
/^ +<action name="error_mail">$/{
  foundError=1
}
/^ +<action name="succeed_mail">/{
  foundSuccess=1
}
NF && !foundError && !foundSuccess
' Input_file

Explanation: Adding detailed explanation for above.

awk '                              ##Starting awk program from here.
/^ +<\/action>/ && foundSuccess{   ##Checking if line has </action> and variable foundSuccess is SET then do following.
  foundSuccess=""                  ##Nullify variable foundSuccess here.
  next                             ##next will skip all further statements from here.
}
/^ +<\/action>/ && foundError{     ##Checking if line has </action> and variable foundError is SET then do following.
  foundError=""                    ##Nullify variable foundError here.
  next                             ##next will skip all further statements from here.
}
/^ +<action name="error_mail">$/{  ##Checking if line starts with space and have <action name="error_mail">
  foundError=1                     ##Setting variable foundError to 1 here.
}
/^ +<action name="succeed_mail">/{ ##Checking if line starts with space and have <action name="succeed_mail">
  foundSuccess=1                   ##Setting variable foundSuccess to 1 here.
}
NF && !foundError && !foundSuccess ##Checking if line is NOT empty AND variable foundError AND variable foundSuccess is NOT set then print that line.
' Input_file                       ##Mentioning Input_file name here.

NOTE: To pass multiple xml files in place of Input_file use *.xml to it, but this will not in place save. To perform in place save use GNU awk, change awk to awk -i inplace in above code. But its better to test it on few files and then run inplace option please for safer side. You could see this link how to do inplace editing with awk with a backup of Input_file too https://stackoverflow.com/a/16529730/5866580

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • Thanks @RavinderSingh13 ! it throws below error . awk -i inplace ' > /^ +<\/action>/ && foundSuccess{ > foundSuccess="" > next > } > /^ +<\/action>/ && foundError{ > foundError="" > next > } > /^ +$/{ > foundError=1 > } > /^ +/{ > foundSuccess=1 > } > NF && !foundError && !foundSuccess > ' redraw_workflow_curve2.xml Usage: awk [POSIX or GNU style options] -f progfile [--] file ... Usage: awk [POSIX or GNU style options] [--] 'program' file ... – pravek Dec 11 '20 at 08:32
  • @PraveenKumar, When I checked this for a single file it worked perfectly fine. BY your comments error is not clear, kindly copy/paste it clearly and try to test it on a single file once and let me know then. – RavinderSingh13 Dec 11 '20 at 09:18
  • 1
    @PraveenKumar what is that `>` doing at the start of each line of the script in your comment? – Ed Morton Dec 11 '20 at 15:42
  • @PraveenKumar, Hi Praveen, please do check mine and Ed sir's answers and let us know how it went? – RavinderSingh13 Dec 12 '20 at 11:30
0

You didn't tell us in what way "it's not working" so I'm assuming you either don't know how to use | in a regexp or don't know you have to quote your scripts.

With a sed that has -E to enable EREs:

$ sed -E '/<action name="(succeed|error)_mail">/,/<\/action>/d' file
 <workflow>
    <action name="start"
    -----
    -----
       </action>

    </workflow>

or with any awk:

$ awk '/<action name="(succeed|error)_mail">/{f=1} !f; /<\/action>/{f=0}' file
 <workflow>
    <action name="start"
    -----
    -----
       </action>

    </workflow>

That is, of course, fragile and will fail for various other layouts of the same XML which is why use of XML-aware tools is always advised.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Had a similar need. My process:

  1. convert xml to a single line.
  2. convert <tag> to </tag> in a new line of its own
  3. grep -v tag (or string as desired )
  4. xmllint --format
  5. qed

This method is quite generic. To convert xml to a single line: tr -d '\n' Csh script for step 2, accepts xml from piped stdin

>cat xmlsinglenewline
#!/bin/csh -f
# $1 is the tag
# Usage: <command>  "tag"
sed "s/<$1/\n\<$1/g" | sed "s/<\/$1>/\<\/$1\>\n/g"

Caveat: Cannot handle nested (same) tag.

Laurel
  • 5,965
  • 14
  • 31
  • 57