Links:
Tools in use:
- Power Automate Desktop (PAD)
- Command Prompt (CMD) Session operated by PAD
- Python 3.10 (importing: Numpy, Pandas & using Element Tree)
- SQL to read data and insert into primary database
Method:
XML to CSV conversion
Im using Power Automate Desktop (PAD) to automate all of this because it is what I know.
The conversion method uses Python, inside of CMD, importing numpy & pandas, using element tree
Goal(s):
I would like to avoid the namespace being written to headers so SQL can interact with the csv data
I would like to append a master csv file with the new data coming in from every call
This last Goal Item is more of a wishlist and I will likely take it over to a PAD forum. Just including it in the rare event someone has a solution:
- (I'd like to avoid using CMD all together) but given the limitations of PAD, using IronPython, I have to use a CMD session to utilize regular Python in the CMS session.
- I am using Power Automate Desktop (PAD), which has a Python module. The only issue is that is uses Iron Python 2.7 & I cannot access the libraries & dont know how to code the Python code below in IronPython. If I could get this sorted out, the entire process would be much more efficient. IronPython Module Operators (Again, this would be a nice to have.. The other goals are the priority.
Issues:
- xmlns is being added to some of the headers when writing converted data to csv
- The python code is generating a new .csv file each loop, where I would like for it to simply append the data into a common .csv file, to build the dataset.
Details (Probaably Unnecessary):
First, I am not Python expert and I am pretty novice with SQL as well.
Currently, the Python code (see below) converts a webservice call, body formatted XML, to csv format. Then it exports that .csv data to an actual csv file. This works great but it overwrites the file each time, which means I need to program scripts to read this data before the file is deleted/replaced with the new file. I then use SQL to INSERT INTO a main dataset (Append data) using SQL. It also prints the XMLNS in some of the headers, which is something I need to avoid.
If you are wondering why I am performing the conversion & not simply parsing the xml data, my client requires csv format for their datasets. Otherwise, I'd just parse out the XML data as needed. The other reason is that this data is being updated incrementally at set intervals, building an audit trail. So a bulk conversion is not possible due to
What I would like to do is have the Python code perform the conversion & then append a main dataset inside of a csv file.
The other issue here is that the XMLNS from the XML is being pulled into some (Not all) of the headers of the CSV table, which has made the use of SQL to read and insert into the main table an issue. I cannot figure a way around this (Again, novice)
Also, if anyone knows how one would write this in IronPython 2.7, that would be great too because I'd be able to get around using the CMD Session.
So, if I could use Python to append a primary table with the converted data, while escaping the namespace, this would solve all of my (current) issues & would have the added benefit of being 100% more efficient in the movement of this data.
Also, due to the limited toolset I have, I am scripting this using Power Automate in the CMD, using a CMD session.
Python Code (within CMD environment):
cd\WORK\XML
python
import numpy as np
import pandas as pd
from xml.etree import ElementTree
maintree = ElementTree.parse('FILE_XML.xml')
parentroot = maintree.getroot()
all_tags = list(set([elem.tag for elem in parentroot.iter()]))
rows = []
for child in parentroot:
temp_dict = {}
for i in all_tags:
tag_values = {}
for inners in child.iter(i):
temp_tag_value = {}
temp_dict.update(inners.attrib)
temp_tag_value[i.rsplit("}", 1)[1]] = inners.text
tag_values.update(temp_tag_value)
temp_dict.update(tag_values)
rows.append(temp_dict)
dataframe = pd.DataFrame.from_dict(rows, orient='columns')
dataframe = dataframe.replace({np.nan: None})
dataframe.to_csv('FILE_TABLE_CMD.csv', index=False)