I have an issue where in our application while writing a python script we use the file data as string only and output as string only. and then that output string we use as a file e.g xml or csv Below is the e.g of an tab delimited .txt file format which will be coming from the customer into our application. H is an header which will come only once and D is line level data which can be repeatative. Below is the specification about what these fields are. How can I convert this data to an csv format or an xml format. I saw lot of examples on stackoverflow but all are about open the file and load the file. Which is not working in my case.
H CustID CustPO OrderDate UserID EmergencyFlag
D <null> PartNumber Description Quantity
D <null> PartNumber Description Quantity
D <null> PartNumber Description Quantity
Expected Input:
H 1520982 wpg-A10026 2020-11-24 STG N
D 52892 LAMP RED OVAL STOP TAIL TURN 2
D AL919338 ECU MODULE 2S/1M M/H PLC SELECT 1
Expected CSV Output:
H,1520982,wpg-A10026,2020-11-24,STG,N
D,,52892,LAMP RED OVAL STOP TAIL TURN,2
D,,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1
Expected XML Output:
<?xml version="1.0" encoding="utf-8"?>
<Record>
<Header>
<RecordType>H</RecordType>
<CustID>1520982</CustPO>
<CustPO>wpg-A10026</CustPO>
<OrderDate>2020-11-24</OrderDate>
<UserID>STG</UserID>
<EmergencyFlag>N</EmergencyFlag>
</Header>
<Details>
<RecordType>D</RecordType>
2nd field is null
<PartNumber>52892</PartNumber>
<Description>LAMP RED OVAL STOP TAIL TURN</Description>
<Quantity>2</Quantity>
</Details>
<Details>
<RecordType>D</RecordType>
2nd field is null
<PartNumber>AL919338</PartNumber>
<Description>ECU MODULE 2S/1M M/H PLC SELECT</Description>
<Quantity>1</Quantity>
</Details>
</Record>
The output I got but somewhat wrong. If you see the description field its one entire field but it was replaced with commas. But not the actual tab spaces.
H 1520982 wpg-A10026 2020-11-24 STG N
D 52892 LAMP,,RED,OVAL,STOP,TAIL,TURN 2
D AL919338 ECU,MODULE,2S/1M,M/H,PLC,SELECT 1.
So What I did i corrected your code as content[i] = filter(bool, content[i].split("\t")) so it can actually give the comma where there was a tab delimit. For now it seems bit working but my second field under Details D was null so i wanted it as two commas ,, but that did not appear any hint for that? Thank you very much for your time. appreciated
The current output I got is this below With two issues still remaining after "D" there is a null value need to check if that is actually required to appear in the output. 2nd issue the Description field already had one comma, So need to check with customer if they can stop sending comma or other characters in description field. Or for better accuracy i need to convert this to an xml. But Still Thank you very much this is good to start with H,1520982,wpg-A10026,2020-11-24,STG,N D,52892,LAMP, RED OVAL STOP TAIL TURN,2, D,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1,