I have a xml file with the following data.
<record record_no = "2" error_code="100">"18383531";"22677833";"21459732";"41001";"394034";"0208";"Prime Lending - ;Corporate - 2201";"";"Prime Lending - Lacey - 2508";"Prime Lending - Lacey - 2508";"1";"rrvc";"Tiffany Poe";"HEIDI";"BUNDY";"000002274";"2.0";"18.0";"2";"362661";"Rejected by IRS";"A1AAA";"20160720";"1021";"HEDI & Bundy";"4985045838";"PPASSESS";"Web";"3683000826";"823";"IC W2";"";"";"";"";"Rapid_20160801_Monthly.txt";"20160720102100";"";"20160803095309";"286023";"RGT";"1";"14702324400223";"14702324400223";"0";"OMCProcessed"
I'm using following code:
cat RR_00404.fin.bc_lerr.xml.bc| awk 'BEGIN { FS=OFS=";" }/<record/ { gsub(/"/,"\""); gsub(/.*=" ">.*/,"",$1);print $1,$40,$43,$46 ,"'base_err_xml'", "0",$7; }'
The idea is to do the following:
- Replace
"e;
with"
- Extract the
error_code
- Print
"
and;
seperated values. - Use
sqlldr
to load ( not to worry about this).
Problem to solve:
- There is
;
within the text. e.gPrime Lending -
;Corporate - 2201
- There's
&
Output:
100;"20160803095309";"1";"1";"base_err_xml";"0";"Prime Lending
100;"286023";"14702324400223";"OMCProcessed";"base_err_xml";"0";"Prime Lending - Corporate - 2201"
100;"286024-1";"";"OMCProcessed";"base_err_xml";"0";"Prime Lending - Corporate - 2201"