2

Please bear with me...

I have a large xml file, I need to find a string "JOBNAME=9027" then find the line that comes after it that contains "TASKTYPE" and change that line.

So I have to change the TASKTYPE line that comes after JOBNAME=9027. There are several hundred JOBNAME and TASKTYPE lines, all different amounts of lines away from each other.

I have tried sed, awk and bash to no avail. I am sure there is a way to do it, but it is escaping me.

EXAMPLE:

JOBNAME="MYSAP#SDOR-SG-D-LATECODED-0927"
            JUL="1"
            JUN="1"
            MAR="1"
            MAXDAYS="0"
            MAXRERUN="0"
            MAXRUNS="0"
            MAXWAIT="0"
            MAY="1"
            MULTY_AGENT="N"
            NODEID="sappr2"
            NOV="1"
            OCT="1"
            PARENT_FOLDER="MYSAP#SSDOR-D-SG-LATECODED-0927"
            PRIORITY="10"
            RETRO="0"
            RULE_BASED_CALENDAR_RELATIONSHIP="O"
            RUN_AS="MYSAP"
            SEP="1"
            SHIFT="Ignore Job"
            SHIFTNUM="+00"
            SUB_APPLICATION="MYSAP"
            SYSDB="0"
            TASKTYPE="Job"
John1024
  • 109,961
  • 14
  • 137
  • 171
Vonedaddy
  • 73
  • 4
  • Give an example chunk and your desired output from that.. – heemayl Jun 01 '16 at 01:11
  • Here goes a chunk, remember there will be multiples of similar data. – Vonedaddy Jun 01 '16 at 01:16
  • It keeps telling me the text I enter is too long. – Vonedaddy Jun 01 '16 at 01:17
  • Possibly related: http://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash or http://stackoverflow.com/questions/4680143/how-to-parse-xml-using-shellscript – Eric Renouf Jun 01 '16 at 01:17
  • Please [edit your question](http://stackoverflow.com/posts/37557924/edit) and the chuck on the question.. – heemayl Jun 01 '16 at 01:18
  • done, sorry, obviously new here. – Vonedaddy Jun 01 '16 at 01:18
  • The file snippet you entered is not XML... I assume it is a part of the actual XML with all lines you entered are just attributes of the same node. Anyway... for parsing xml, you should use tools like `xmllint` or `xmlstarlet`; not `awk`/`sed`... – anishsane Jun 01 '16 at 04:17

1 Answers1

7

Using sed

Try:

sed '/JOBNAME.*0927/,/TASKTYPE/ {s/TASKTYPE.*/TASKTYPE="NewJob"/}' largefile

This produces as output:

JOBNAME="MYSAP#SDOR-SG-D-LATECODED-0927"
            JUL="1"
            JUN="1"
            MAR="1"
            MAXDAYS="0"
            MAXRERUN="0"
            MAXRUNS="0"
            MAXWAIT="0"
            MAY="1"
            MULTY_AGENT="N"
            NODEID="sappr2"
            NOV="1"
            OCT="1"
            PARENT_FOLDER="MYSAP#SSDOR-D-SG-LATECODED-0927"
            PRIORITY="10"
            RETRO="0"
            RULE_BASED_CALENDAR_RELATIONSHIP="O"
            RUN_AS="MYSAP"
            SEP="1"
            SHIFT="Ignore Job"
            SHIFTNUM="+00"
            SUB_APPLICATION="MYSAP"
            SYSDB="0"
            TASKTYPE="NewJob"

How it works:

  • /JOBNAME.*0927/,/TASKTYPE/ {...} executes the commands in curly braces only for groups of lines that start with a line matching the regex JOBNAME.*0927 and end with the first line after that that matches TASKTYPE.

  • s/TASKTYPE.*/TASKTYPE="NewJob"/ replaces the TASKTYPE followed by anything with TASKTYPE="NewJob".

Using awk

This awk script uses the same logic:

awk '/JOBNAME.*0927/,/TASKTYPE/ {sub(/TASKTYPE.*/, "TASKTYPE=\"NewJob\"")} 1' largefile

How it works:

  • /JOBNAME.*0927/,/TASKTYPE/ {...}

    This executes the commands in curly braces only for groups of lines that start with a line matching the regex JOBNAME.*0927 and end with the first line after that that matches TASKTYPE.

  • sub(/TASKTYPE.*/, "TASKTYPE=\"NewJob\"")

    This performs the substitution.

  • 1

    Unlike sed, awk does not, by default, print anything. This 1 is awk's cryptic shorthand for print-the-whole-line.

    In more detail, 1 is a logical condition. It evaluates to "true." We specified no action to go along with that condition. Therefore, awk performs its default action which is print-the-line: print $0.

John1024
  • 109,961
  • 14
  • 137
  • 171