1

I have an issue where in our application while writing a python script we use the file data as string only and output as string only. and then that output string we use as a file e.g xml or csv Below is the e.g of an tab delimited .txt file format which will be coming from the customer into our application. H is an header which will come only once and D is line level data which can be repeatative. Below is the specification about what these fields are. How can I convert this data to an csv format or an xml format. I saw lot of examples on stackoverflow but all are about open the file and load the file. Which is not working in my case.

H   CustID   CustPO   OrderDate   UserID   EmergencyFlag
D   <null>   PartNumber   Description   Quantity
D   <null>   PartNumber   Description   Quantity
D   <null>   PartNumber   Description   Quantity

Expected Input:

H   1520982 wpg-A10026  2020-11-24  STG N
D       52892   LAMP RED OVAL STOP TAIL TURN    2   
D       AL919338    ECU MODULE 2S/1M M/H PLC SELECT 1

Expected CSV Output:

H,1520982,wpg-A10026,2020-11-24,STG,N
D,,52892,LAMP RED OVAL STOP TAIL TURN,2 
D,,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1

Expected XML Output:

<?xml version="1.0" encoding="utf-8"?>
<Record>
<Header>
<RecordType>H</RecordType>
<CustID>1520982</CustPO>
<CustPO>wpg-A10026</CustPO>
<OrderDate>2020-11-24</OrderDate>
<UserID>STG</UserID>
<EmergencyFlag>N</EmergencyFlag>
</Header>
<Details> 
<RecordType>D</RecordType>
 2nd field is null
<PartNumber>52892</PartNumber>
<Description>LAMP RED OVAL STOP TAIL TURN</Description>
<Quantity>2</Quantity>
</Details>
<Details> 
<RecordType>D</RecordType>
 2nd field is null
<PartNumber>AL919338</PartNumber>
<Description>ECU MODULE 2S/1M M/H PLC SELECT</Description>
<Quantity>1</Quantity>
</Details>
</Record>

The output I got but somewhat wrong. If you see the description field its one entire field but it was replaced with commas. But not the actual tab spaces. H 1520982 wpg-A10026 2020-11-24 STG N D 52892 LAMP,,RED,OVAL,STOP,TAIL,TURN 2
D AL919338 ECU,MODULE,2S/1M,M/H,PLC,SELECT 1.

So What I did i corrected your code as content[i] = filter(bool, content[i].split("\t")) so it can actually give the comma where there was a tab delimit. For now it seems bit working but my second field under Details D was null so i wanted it as two commas ,, but that did not appear any hint for that? Thank you very much for your time. appreciated

The current output I got is this below With two issues still remaining after "D" there is a null value need to check if that is actually required to appear in the output. 2nd issue the Description field already had one comma, So need to check with customer if they can stop sending comma or other characters in description field. Or for better accuracy i need to convert this to an xml. But Still Thank you very much this is good to start with H,1520982,wpg-A10026,2020-11-24,STG,N D,52892,LAMP, RED OVAL STOP TAIL TURN,2, D,AL919338,ECU MODULE 2S/1M M/H PLC SELECT,1,

1 Answers1

0

Normally,the csv module should be able to it. If not (you do not have consistent amount of spaces separating the values, you can manually split the lines:

content = "INPUTGOESHERE".split("\n")

for i in range(len(content)):
    content[i] = filter(bool, content[i].split(" ")) # split the lines at spaces and filter out empty strings

outstr = ""

for line in content:
    line = ",".join(line) # convert values list to a comma separated string for each line
    outstr += line + "\n"

print(outstr)

See the edit of this answer for how to convert CSV to XML.

TheEagle
  • 5,808
  • 3
  • 11
  • 39
  • Thanks for the reply. This still doesn't work. I get the error as scripting.Traceback (most recent call last): File "script", line 2 content = inf.readlines() ^ IndentationError: expected an indented block – Shashank Malali Apr 27 '21 at 10:57
  • I am sorry I am not a python programmer or expert its just in our erp we use to convert the files if the customer has different formats. our standard format is xml or csv. – Shashank Malali Apr 27 '21 at 11:00
  • @ShashankMalali Please copy-paste my code in a file and see if it runs (you'll have to change the file paths, naturally, but _just that_) – TheEagle Apr 27 '21 at 11:09
  • @ShashankMalali My code is properly idented, you must have made an error / mistake while running the code. Did you use the interactive interpereter, or did you put the code in a file ? – TheEagle Apr 27 '21 at 11:11
  • Hi, Thanks that is what i said this code may not work. because we can't use "with open" and we can't pick a file from the path. We just take the entire data in csv file as string and the output should also be an string whether csv or an xml. – Shashank Malali Apr 27 '21 at 11:16
  • @ShashankMalali ah, I did not read your question carefully enough ! Edited my answer, replace `INPUTGOESHERE` with your actual input data, output will be printed to console – TheEagle Apr 27 '21 at 11:19
  • Thanks, I think we are approaching near. But this time I get error: "scripting.Traceback (most recent call last): File "script", line 9, in TypeError: 'str' object is not callable" Whereas line 9 is: line = ",".join(line). Is it because we are directly using join function to "," – Shashank Malali Apr 27 '21 at 11:41
  • The error seems to come from `filter("", content[i].split(" "))`. Please see if it works now. – TheEagle Apr 27 '21 at 11:47