-3

Besides the simple XML parsing as following link:

parsing interactive broker fundamental data

I got more difficult situations in XML parsing:

Two main errors:

  • string indices must be integers

  • list indices must be integers or slices, not str

XML =

<ReportSnapshot Major="1" Minor="0" Revision="1">
    <CoIDs>
       <CoID Type="RepNo">AC317</CoID>
        <CoID Type="CompanyName">HSBC Holdings plc (Hong Kong)</CoID>
    </CoIDs>
    <Issues>
        <Issue ID="1" Type="C" Desc="Common Stock" Order="1">
            <IssueID Type="Name">Ordinary Shares</IssueID>
            <IssueID Type="Ticker">5</IssueID>
            <IssueID Type="CUSIP">G4634U169</IssueID>
            <IssueID Type="ISIN">GB0005405286</IssueID>
            <IssueID Type="RIC">0005.HK</IssueID>
            <IssueID Type="SEDOL">6158163</IssueID>
            <IssueID Type="DisplayRIC">0005.HK</IssueID>
            <IssueID Type="InstrumentPI">312270</IssueID>
            <IssueID Type="QuotePI">1049324</IssueID>
            <Exchange Code="HKG" Country="HKG">Hong Kong Stock Exchange</Exchange>
            <MostRecentSplit Date="2009-03-12">1.14753</MostRecentSplit>
        </Issue>
    </Issues>

    <CoGeneralInfo>
        <CoStatus Code="1">Active</CoStatus>
        <CoType Code="EQU">Equity Issue</CoType>
        <LastModified>2018-07-20</LastModified>
        <LatestAvailableAnnual>2017-12-31</LatestAvailableAnnual>
        <LatestAvailableInterim>2018-03-31</LatestAvailableInterim>
        <Employees LastUpdated="2018-03-31">228899</Employees>
        <SharesOut Date="2018-07-25" TotalFloat="19880413090.0">19949959451.0</SharesOut>
        <ReportingCurrency Code="USD">U.S. Dollars</ReportingCurrency>
        <MostRecentExchange Date="2018-07-25">1.0</MostRecentExchange>
    </CoGeneralInfo>
    <peerInfo lastUpdated="2018-07-20T09:20:26">
        <IndustryInfo>
            <Industry type="TRBC" order="1" reported="0" code="5510101010" mnem="">Banks - NEC</Industry>
            <Industry type="NAICS" order="1" reported="0" code="52211" mnem="">Commercial Banking</Industry>
            <Industry type="NAICS" order="2" reported="0" code="52393" mnem="">Investment Advice</Industry>
            <Industry type="NAICS" order="3" reported="0" code="52392" mnem="">Portfolio Management</Industry>
            <Industry type="SIC" order="0" reported="1" code="6035" mnem="">Federal Savings Institutions</Industry>
            <Industry type="SIC" order="1" reported="0" code="6029" mnem="">Commercial Banks, Nec</Industry>
            <Industry type="SIC" order="2" reported="0" code="6282" mnem="">Investment Advice</Industry>
        </IndustryInfo>
    </peerInfo>

    <Ratios PriceCurrency="HKD" ReportingCurrency="USD" ExchangeRate="7.84530" LatestAvailableDate="2017-12-31">
        <Group ID="Price and Volume">
            <Ratio FieldName="NPRICE" Type="N">74.75000</Ratio>
            <Ratio FieldName="NHIG" Type="N">86.00000</Ratio>
            <Ratio FieldName="NLOW" Type="N">71.45000</Ratio>
            <Ratio FieldName="PDATE" Type="D">2018-07-26T00:00:00</Ratio>
            <Ratio FieldName="VOL10DAVG" Type="N">12.85415</Ratio>
            <Ratio FieldName="EV" Type="N">2455297.00000</Ratio>
        </Group>
        <Group ID="Income Statement">
            <Ratio FieldName="MKTCAP" Type="N">1493871.00000</Ratio>
            <Ratio FieldName="AREV" Type="N">321618.10000</Ratio>
            <Ratio FieldName="AEBITD" Type="N">177727.40000</Ratio>
            <Ratio FieldName="ANIAC" Type="N">86070.79000</Ratio>
        </Group>
    </Ratios>
</ReportSnapshot>

I want to convert this information to CSV format in this format:

CompanyName                     Ticker   Industry type="TRBC" Industry type="NAICS"  LastModified   ReportingCurrency   NPRICE     MKTCAP
HSBC Holdings plc (Hong Kong)    5       Banks - NEC          Commercial Banking       2018-07-20      USD              74.75000   1493871.00000
Geshode
  • 3,600
  • 6
  • 18
  • 32
Aqueous Carlos
  • 445
  • 7
  • 20
  • Possible duplicate of [How do I parse XML in Python?](https://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python) – Aqueous Carlos Nov 11 '18 at 09:28

1 Answers1

1

For writing to CSV file Python has builtin csv module. For parsing XML file, I recommend BeautifulSoup - with which this problem becomes easy:

xml_data = """<ReportSnapshot Major="1" Minor="0" Revision="1">
    <CoIDs>
       <CoID Type="RepNo">AC317</CoID>
        <CoID Type="CompanyName">HSBC Holdings plc (Hong Kong)</CoID>
    </CoIDs>
    <Issues>
        <Issue ID="1" Type="C" Desc="Common Stock" Order="1">
            <IssueID Type="Name">Ordinary Shares</IssueID>
            <IssueID Type="Ticker">5</IssueID>
            <IssueID Type="CUSIP">G4634U169</IssueID>
            <IssueID Type="ISIN">GB0005405286</IssueID>
            <IssueID Type="RIC">0005.HK</IssueID>
            <IssueID Type="SEDOL">6158163</IssueID>
            <IssueID Type="DisplayRIC">0005.HK</IssueID>
            <IssueID Type="InstrumentPI">312270</IssueID>
            <IssueID Type="QuotePI">1049324</IssueID>
            <Exchange Code="HKG" Country="HKG">Hong Kong Stock Exchange</Exchange>
            <MostRecentSplit Date="2009-03-12">1.14753</MostRecentSplit>
        </Issue>
    </Issues>

    <CoGeneralInfo>
        <CoStatus Code="1">Active</CoStatus>
        <CoType Code="EQU">Equity Issue</CoType>
        <LastModified>2018-07-20</LastModified>
        <LatestAvailableAnnual>2017-12-31</LatestAvailableAnnual>
        <LatestAvailableInterim>2018-03-31</LatestAvailableInterim>
        <Employees LastUpdated="2018-03-31">228899</Employees>
        <SharesOut Date="2018-07-25" TotalFloat="19880413090.0">19949959451.0</SharesOut>
        <ReportingCurrency Code="USD">U.S. Dollars</ReportingCurrency>
        <MostRecentExchange Date="2018-07-25">1.0</MostRecentExchange>
    </CoGeneralInfo>
    <peerInfo lastUpdated="2018-07-20T09:20:26">
        <IndustryInfo>
            <Industry type="TRBC" order="1" reported="0" code="5510101010" mnem="">Banks - NEC</Industry>
            <Industry type="NAICS" order="1" reported="0" code="52211" mnem="">Commercial Banking</Industry>
            <Industry type="NAICS" order="2" reported="0" code="52393" mnem="">Investment Advice</Industry>
            <Industry type="NAICS" order="3" reported="0" code="52392" mnem="">Portfolio Management</Industry>
            <Industry type="SIC" order="0" reported="1" code="6035" mnem="">Federal Savings Institutions</Industry>
            <Industry type="SIC" order="1" reported="0" code="6029" mnem="">Commercial Banks, Nec</Industry>
            <Industry type="SIC" order="2" reported="0" code="6282" mnem="">Investment Advice</Industry>
        </IndustryInfo>
    </peerInfo>

    <Ratios PriceCurrency="HKD" ReportingCurrency="USD" ExchangeRate="7.84530" LatestAvailableDate="2017-12-31">
        <Group ID="Price and Volume">
            <Ratio FieldName="NPRICE" Type="N">74.75000</Ratio>
            <Ratio FieldName="NHIG" Type="N">86.00000</Ratio>
            <Ratio FieldName="NLOW" Type="N">71.45000</Ratio>
            <Ratio FieldName="PDATE" Type="D">2018-07-26T00:00:00</Ratio>
            <Ratio FieldName="VOL10DAVG" Type="N">12.85415</Ratio>
            <Ratio FieldName="EV" Type="N">2455297.00000</Ratio>
        </Group>
        <Group ID="Income Statement">
            <Ratio FieldName="MKTCAP" Type="N">1493871.00000</Ratio>
            <Ratio FieldName="AREV" Type="N">321618.10000</Ratio>
            <Ratio FieldName="AEBITD" Type="N">177727.40000</Ratio>
            <Ratio FieldName="ANIAC" Type="N">86070.79000</Ratio>
        </Group>
    </Ratios>
</ReportSnapshot>"""

from bs4 import BeautifulSoup
import csv

soup = BeautifulSoup(xml_data, 'xml')

headers = ['CompanyName',
           'Ticker',
           'Industry type="TRBC"',
           'Industry type="NAICS"',
           'LastModified',
           'ReportingCurrency',
           'NPRICE',
           'MKTCAP']


with open('data.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile, delimiter=',', quotechar='"')
    csvwriter.writerow(headers)
    row = []
    row.append(soup.select_one('CoID[Type="CompanyName"]').text)
    row.append(soup.select_one('IssueID[Type="Ticker"]').text)
    row.append(soup.select_one('Industry[type="TRBC"]').text)
    row.append(soup.select_one('Industry[type="NAICS"]').text)
    row.append(soup.select_one('LastModified').text)
    row.append(soup.select_one('ReportingCurrency[Code]')['Code'])
    row.append(soup.select_one('Ratio[FieldName="NPRICE"]').text)
    row.append(soup.select_one('Ratio[FieldName="MKTCAP]"').text)
    csvwriter.writerow(row)

The result is in data.csv file (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91