0

I want to extract the value from the specified tag in the given xml by using unix. I have an unformatted xml (All data in single line) and i need to search for a tag PolNumber. It is present multiple times in the same line.

Kindly find the xml below

<?xml version="1.0" encoding="UTF-8"?><TXLife><UserAuthRequest><UserLoginName>FirstPenn</UserLoginName><UserPswd><CryptType>None</CryptType><Pswd>None</Pswd></UserPswd><UserDate>2016-05-06</UserDate><UserTime>11:06</UserTime><VendorApp><VendorName VendorCode="FPPTB">FirstPenn</VendorName><AppName>ACORD XML Download</AppName><AppVer>1.0</AppVer></VendorApp></UserAuthRequest><TXLifeRequest><TransRefGUID>4B6BB6FB-6FA0-4678-A3A2-862E7AE7D884</TransRefGUID><TransType tc="1125"/><TransExeDate>2016-05-06</TransExeDate><TransExeTime>11:06</TransExeTime><TransMode tc="2"/><InquiryLevel tc="3"/><MaxRecords>0</MaxRecords><PendingResponseOK tc="0">False</PendingResponseOK><NoResponseOK tc="1">True</NoResponseOK><TestIndicator tc="0">False</TestIndicator><OLifE Version="2.7"><SourceInfo><CreationDate>2016-05-06</CreationDate><SourceInfoName>First Penn-Pacific</SourceInfoName><SourceInfoDescription>Pending Case Status</SourceInfoDescription><FileControlID>1223232304</FileControlID></SourceInfo></Holding><Holding id="HLD_4902160"><HoldingTypeCode tc="2"/><HoldingStatus tc="4"/><AsOfDate>2016-05-05</AsOfDate><Policy CarrierPartyID="LLCTB_4902160"><CarrierCode>LLCTB</CarrierCode><PolNumber>4902160</PolNumber><LineOfBusiness tc="1">Life</LineOfBusiness><ProductType tc="4"/><ProductCode>VLON14      </ProductCode><PlanName>VLON14      </PlanName><PolicyStatus tc="24">Approved, not issued</PolicyStatus><Jurisdiction tc="56"/><EffDate>2016-02-18</EffDate><PaymentMode tc="9">Single Payment</PaymentMode><PaymentAmt>62336.0000</PaymentAmt><Life><TargetPremAmt>5759.9700</TargetPremAmt><TotalRolloverAmt>0.0000</TotalRolloverAmt><FaceAmt>261579.0000</FaceAmt><Coverage id="COV_4902160_1"><IndicatorCode tc="1"/><LivesType tc="2147483647"/><LifeParticipant PartyID="INS_4902160_1"><LifeParticipantRoleCode tc="1"/><IssueAge>53</IssueAge><IssueGender tc="1"/><TobaccoPremiumBasis tc="1">Non Smoker</TobaccoPremiumBasis><PermTableRating tc="1"/><UnderwritingClass tc="2">Preferred risk</UnderwritingClass></LifeParticipant></Coverage></Life><Holding id="HLD_4902270"><HoldingTypeCode tc="2"/><HoldingStatus tc="4"/><AsOfDate>2016-05-06</AsOfDate><Policy CarrierPartyID="LLCTB_4902270"><CarrierCode>LLCTB</CarrierCode><PolNumber>4902270</PolNumber><LineOfBusiness tc="1">Life</LineOfBusiness><ProductType tc="4"/><ProductCode>VLON14      </ProductCode><PlanName>VLON14      </PlanName><PolicyStatus tc="8">Pending Issue</PolicyStatus><Jurisdiction tc="17"/><EffDate>2016-02-24</EffDate><PaymentMode tc="1">Annual</PaymentMode><PaymentAmt>2532.0000</PaymentAmt><Life><TargetPremAmt>7422.0000</TargetPremAmt><TotalRolloverAmt>0.0000</TotalRolloverAmt><FaceAmt>200000.0000</FaceAmt><Coverage id="COV_4902270_1"><IndicatorCode tc="1"/><LivesType tc="2147483647"/><LifeParticipant PartyID="INS_4902270_1"><LifeParticipantRoleCode tc="1"/><IssueAge>69</IssueAge><IssueGender tc="2"/><TobaccoPremiumBasis tc="1">Non Smoker</TobaccoPremiumBasis><PermTableRating tc="1"/><UnderwritingClass tc="1">Standard Risk</UnderwritingClass></LifeParticipant></Coverage></Life>

It is working as expected by using the below grep command

grep -oP "<PolNumber>[0-9]*</PolNumber>" samp.xml | grep -oe '\([0-9]*\)'

But It working in online unix compiling websites but the same is not working in my machine. It says Grep Invalid option --o . I am not sure about the version problem or something but i need to fix it with my current unix. Could you please help me to do that.

Thanks in Advance
Manivannan

Mani
  • 81
  • 10
  • What's your version/flavor of Unix? Have you tried finding a similar option in the grep man page for your OS? – ZnArK May 18 '16 at 13:20
  • Are you expecting a solution only using `grep` ? is `awk` supported? – Inian May 18 '16 at 13:39

2 Answers2

1

Using simple utilities:

tr "<" "\n" < samp.xml | grep "^PolNumber" | cut -d">" -f2
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

Parsing xml with non-xml-parsers like grep, sed or whatever is usually a bad idea.

Anyhow, here's a quick and dirty solution with sed:

sed 's#\(<PolNumber>[0-9]*\)</PolNumber>#\1\n#g' samp.xml | grep '<PolNumber>' | sed 's#.*<PolNumber>\([0-9]*\)$#\1#'

It only works if your xml is in one line.

hellerpop
  • 509
  • 4
  • 7