0

Hi guys Im newbie in this command.., i just want to ask how i can make an output from xml file using awk or gawk command . See below

xml file:

  <Splits>
    <ImageNum>595</ImageNum>
    <SplitPos>5343</SplitPos>
    <SplitNextTop>5343</SplitNextTop>
  </Splits>
  <Splits>
    <ImageNum>632</ImageNum>
    <SplitPos>2777</SplitPos>
    <SplitNextTop>2718</SplitNextTop>
  </Splits>
  <Splits>
    <ImageNum>632</ImageNum>
    <SplitPos>5322</SplitPos>
    <SplitNextTop>5322</SplitNextTop>
  </Splits>
  <Splits>
    <ImageNum>640</ImageNum>
    <SplitPos>2786</SplitPos>
    <SplitNextTop>2700</SplitNextTop>
  </Splits>
  <Splits>
    <ImageNum>640</ImageNum>
    <SplitPos>5319</SplitPos>
    <SplitNextTop>5320</SplitNextTop>
  </Splits>
  <Splits>
    <ImageNum>31</ImageNum>
    <SplitPos>2798</SplitPos>
    <SplitNextTop>2760</SplitNextTop>
  </Splits>

Output to be like:

ImageNum    SplitPos    SplitNextTop    SplitPos    SplitNextTop
595         5343        5343
632         2777        2718            5322        5322
640         2786        2700            5319        5320
31          2798        2760

Thank you so much guys....

User555
  • 1
  • 1
  • 1
    Welcome to Stack Overflow. [SO is a question and answer page for professional and enthusiast programmers](https://stackoverflow.com/tour). Please add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself – Cyrus Feb 14 '21 at 03:25
  • First, that's not valid XML (no root node). Next, you want to use an XML parser to parse XML data. Look into [tag:xmlstarlet] – glenn jackman Feb 14 '21 at 03:43
  • It seems like you are attempting to use regular expressions to parse XML. Generally, this is not something meant to be solved using regex. Consider using an XML parser for this task. – costaparas Feb 14 '21 at 03:54
  • [Reading XML Data with POSIX AWK](http://gawkextlib.sourceforge.net/xml/gawk-xml.html#Reading-XML-Data-with-POSIX-AWK) (offsite link) – James Brown Feb 14 '21 at 09:44
  • Also check [this](https://stackoverflow.com/a/1732454/1394729) out ... – tink Feb 16 '21 at 02:26

1 Answers1

0

I have some doubts about your output, but this could be useful:

awk -f step1.awk data1.xml  | awk -f step2.awk

step1.awk:

    { ### main
        s_line=$0
        gsub(/</," <",$0) ### separate words with space
        gsub(/>/,"> ",$0) ### separate words with space

        gsub(/<\Splits>/,"",$0)
        gsub(/<\/Splits>/,"_newline_",$0) ### new-line signal

        for (i=1;i<=NF;++i) {  
            gsub(/^<\/.*>$/,"",$i) ### eliminate the ending tags: </xxxx>
        }

        for (i=1;i<=NF;++i) {
            gsub(/</,"",$i) ### eliminate all "<"
            gsub(/>/,"",$i) ### eliminate all ">"
        }

        print $0
    } ### main

step2.awk:

    BEGIN {
    n_col=0   ### column number
    n_line=0  ### line number
    s_line="" ### line string
    n_maxcols=0
}

{ ### main
        if (NF==2) {
            ++n_col
            s_line=s_line sprintf("%s,",$2)  ### column 2 has the values, s_line contains concatenated values (comma separated)
            a_head[n_col]=$1    ### a_head is an array of columns
            n_maxcols=n_col
        }
        if($0~/_newline_/) {
            ++n_line
            a_lines[n_line]=s_line   ### a_lines is an array of lines
            s_line=""  ### reset line
            n_col=0    ### reset cols
        }

} ### main

END {
    for (n=1;n<=n_maxcols;++n) {  ### print header (comma separated)
        printf("%s,",a_head[n])
    }
    print "" ### separate header from line values

    for (n=1;n<=n_line;++n) {  ### print lines with values
        printf("%s\n",a_lines[n])
    }


  }

I think comma-separated output is more readable:

ImageNum,SplitPos,SplitNextTop,
595,5343,5343,
632,2777,2718,
632,5322,5322,
640,2786,2700,
640,5319,5320,
31,2798,2760,

I hope code-comments be useful to understand. Closed-tag "Splits" is used as "newline" (new-record) signal.