1

Im working with an old soup service from TSETMC site .I spent too much time to handle it's response as object .finally i took a decision to handle it response as text. so i have taken it response in string . i wanted to parse just 3 element of it with my python code to build my customized output as a pandas Dataframe.

here the String :

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <TradeLastDayResponse xmlns="http://tsetmc.com/">
            <TradeLastDayResult>
                <xs:schema id="TradeLastDay" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
                    <xs:element name="TradeLastDay" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
                        <xs:complexType>
                            <xs:choice minOccurs="0" maxOccurs="unbounded">
                                <xs:element name="TradeLastDay">
                                    <xs:complexType>
                                        <xs:sequence>
                                            <xs:element name="LVal18AFC" type="xs:string" minOccurs="0" />
                                            <xs:element name="PriceFirst" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="DEven" type="xs:int" minOccurs="0" />
                                            <xs:element name="InsCode" type="xs:long" minOccurs="0" />
                                            <xs:element name="LVal30" type="xs:string" minOccurs="0" />
                                            <xs:element name="PClosing" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="PDrCotVal" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="ZTotTran" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="QTotTran5J" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="QTotCap" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="PriceChange" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="PriceMin" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="PriceMax" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="PriceYesterday" type="xs:decimal" minOccurs="0" />
                                            <xs:element name="Last" type="xs:unsignedByte" minOccurs="0" />
                                            <xs:element name="HEven" type="xs:int" minOccurs="0" />
                                        </xs:sequence>
                                    </xs:complexType>
                                </xs:element>
                            </xs:choice>
                        </xs:complexType>
                    </xs:element>
                </xs:schema>
                <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
                    <TradeLastDay xmlns="">
                        <TradeLastDay diffgr:id="TradeLastDay1" msdata:rowOrder="0">
                            <LVal18AFC>افران4</LVal18AFC>
                            <PriceFirst>16577.00</PriceFirst>
                            <DEven>20220611</DEven>
                            <InsCode>54162468776731366</InsCode>
                            <LVal30>صندوق س افرا نماد پايدار-ثابت</LVal30>
                            <PClosing>16577.00</PClosing>
                            <PDrCotVal>16577.00</PDrCotVal>
                            <ZTotTran>1</ZTotTran>
                            <QTotTran5J>8445000</QTotTran5J>
                            <QTotCap>139992765000.00</QTotCap>
                            <PriceChange>27.00</PriceChange>
                            <PriceMin>16577.00</PriceMin>
                            <PriceMax>16577.00</PriceMax>
                            <PriceYesterday>16550.00</PriceYesterday>
                            <Last>0</Last>
                            <HEven>113533</HEven>
                        </TradeLastDay>
                        <TradeLastDay diffgr:id="TradeLastDay2" msdata:rowOrder="1">
                            <LVal18AFC>كمند4</LVal18AFC>
                            <PriceFirst>10025.00</PriceFirst>
                            <DEven>20220611</DEven>
                            <InsCode>55266031060966810</InsCode>
                            <LVal30>صندوق س. با درآمد ثابت كمند</LVal30>
                            <PClosing>10025.00</PClosing>
                            <PDrCotVal>10025.00</PDrCotVal>
                            <ZTotTran>1</ZTotTran>
                            <QTotTran5J>29919000</QTotTran5J>
                            <QTotCap>299937975000.00</QTotCap>
                            <PriceChange>7.00</PriceChange>
                            <PriceMin>10025.00</PriceMin>
                            <PriceMax>10025.00</PriceMax>
                            <PriceYesterday>10018.00</PriceYesterday>
                            <Last>0</Last>
                            <HEven>95750</HEven>
                        </TradeLastDay>
                        <TradeLastDay diffgr:id="TradeLastDay3" msdata:rowOrder="2">
                            <LVal18AFC>فرآور</LVal18AFC>
                            <PriceFirst>31650.00</PriceFirst>
                            <DEven>20220611</DEven>
                            <InsCode>408934423224097</InsCode>
                            <LVal30>فرآوري‌موادمعدني‌ايران‌</LVal30>
                            <PClosing>29910.00</PClosing>
                            <PDrCotVal>29780.00</PDrCotVal>
                            <ZTotTran>493</ZTotTran>
                            <QTotTran5J>980820</QTotTran5J>
                            <QTotCap>29332422910.00</QTotCap>
                            <PriceChange>-990.00</PriceChange>
                            <PriceMin>29000.00</PriceMin>
                            <PriceMax>31650.00</PriceMax>
                            <PriceYesterday>30770.00</PriceYesterday>
                            <Last>0</Last>
                            <HEven>121522</HEven>
                        </TradeLastDay>
                        <TradeLastDay diffgr:id="TradeLastDay4" msdata:rowOrder="3">
                            <LVal18AFC>اپال</LVal18AFC>
                            <PriceFirst>19150.00</PriceFirst>
                            <DEven>20220611</DEven>
                            <InsCode>655060129740445</InsCode>
                            <LVal30>فرآوري معدني اپال كاني پارس</LVal30>
                            <PClosing>19790.00</PClosing>
                            <PDrCotVal>19650.00</PDrCotVal>
                            <ZTotTran>1467</ZTotTran>
                            <QTotTran5J>1261689</QTotTran5J>
                            <QTotCap>24966884620.00</QTotCap>
                            <PriceChange>-140.00</PriceChange>
                            <PriceMin>19150.00</PriceMin>
                            <PriceMax>19920.00</PriceMax>
                            <PriceYesterday>19790.00</PriceYesterday>
                            <Last>0</Last>
                            <HEven>121425</HEven>
                        </TradeLastDay>
                    </TradeLastDay>
                </diffgr:diffgram>
            </TradeLastDayResult>
        </TradeLastDayResponse>
    </soap:Body>
</soap:Envelope>

my result should be something like this :

             InsCode     DEven  PriceYesterday
0  54162468776731366  20220611         16550.0
1  55266031060966810  20220611         10018.0
2    408934423224097  20220611         30770.0
3    655060129740445  20220611         19790.0

i've tried to parse it with xml.etree.ElementTree.fromstring(my_xml_string) but i could not reach the result

Ibrahim
  • 798
  • 6
  • 26
Hadi Rahjoo
  • 175
  • 7

1 Answers1

0

The process I used:

  1. save xml string as file
  2. create dict from xml (courtesy @firelion.cis)
  3. explore the dict where to find the variables
  4. make a template df and then collect the data

Import modules and create root

import xml.etree.ElementTree as ET
import pandas as pd

xml = 'path to xml file'

tree = ET.parse(xml)
r = tree.getroot()

Create dict from xml file

def xml_to_dict(r):

    if len(list(r)) == 0:
        return {r.tag: r.text}
    else:
        return {r.tag: list(map(xml_to_dict, list(r)))}


d = xml_to_dict(r)

Create a template df to fill

df_a = pd.DataFrame(
        {'DEven': [],
         'InsCode': [],
         'PriceYesterday': []
         }
    )

Create a variable 'tld' to iterate over

tld = d['Envelope'][0]['Body'][0]['TradeLastDayResponse'][0]['TradeLastDayResult'][1]['diffgram'][0]['TradeLastDay']

Iterate over 'tld', collect data and update df

for i in range(len(tld)):
    deven = tld[i]['TradeLastDay'][2]['DEven']
    inscode = tld[i]['TradeLastDay'][3]['InsCode']
    price_yest = tld[i]['TradeLastDay'][13]['PriceYesterday']
    df_b = pd.DataFrame(
        {'DEven': [deven],
         'InsCode': [inscode],
         'PriceYesterday': [price_yest]
         }
    )
    # for every iteration fill df template with variables
    df_a = pd.concat([df_a, df_b], ignore_index=True)
fvg
  • 153
  • 3
  • 9