0

I have a problem with parse xml to Pandas DF in Python. I can't get the data from XML file.

I would like to read this XML file and convert it to DF

NIP Name Sell ID contractor ID contractor Name contractor Adress Documents ID Date K_23 K_24

Please help

<?xml version="1.0" encoding="utf-8"?>
<JPK xmlns="http://jpk.mf.gov.pl/wzor/2017/11/13/1113/" xmlns:etd="http://crd.gov.pl/xml/schematy/dziedzinowe/mf/2016/01/25/eD/DefinicjeTypy/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://jpk.mf.gov.pl/wzor/2017/11/13/1113/ https://www.gov.pl/documents/2034621/2182793/Schemat_JPK_VAT(3)_v1-1.xsd/ab0741d5-fa6d-9596-b089-6778ea5df160">
  <Head>
    <ID="zzzzzzz" versionSchemy="1-1">zzzzzz</ID>
    <Tarfet>1</Targer>
    <CreateDate>2020-01-21T09:51:58</CreateDate>
    <Datefrom>2019-11-01</Datefrom>
    <DateTo>2019-11-30</DateTo>
    <System>xxxxx</System>
  </Head>
  <Client>
    <NIP>xxxxxxxx</NIP>
    <Name>xxxxxx</Name>
  </Client>
  <Sell>
    <Sell ID>1</Sell ID>
    <contractor ID >xxxxxxx</contractor ID>
    <contractor Name>xxxxxxx"</contractor Name>
    <contractor Adress>xxxxxxxxx</contractor Adress>
    <Documents ID >xxxxxxxxxx</Documents ID >
    <Date>2019-11-01</Date>
    <K_23>31532513.17</K_23>
    <K_24>5324.05</K_24>
  </Sell>
  <Sell>
    <Sell ID>2</Sell ID>
    <contractor ID >yyyy</contractor ID>
    <contractor Name>yyyyy"</contractor Name>
    <contractor Adress>yyyyyyy</contractor Adress>
    <Documents ID >yyyyyyyyy</Documents ID >
    <Date>2019-11-05</Date>
    <K_23>312513.17</K_23>
    <K_24>5532.05</K_24>
Benzi
  • 1
  • Does this answer your question? [How to convert an XML file to nice pandas dataframe?](https://stackoverflow.com/questions/28259301/how-to-convert-an-xml-file-to-nice-pandas-dataframe) – manny Mar 27 '20 at 12:14
  • add the code that you have tried too – Mohsen Mar 27 '20 at 12:15
  • Upload a valid XML file. The current one is not a valid xml doc. – balderman Mar 27 '20 at 19:17

1 Answers1

0

Use xmltodict:

import xmltodict

with open(file_name, 'rb') as f:
     data = xmltodict.parse(f.read())

data will be an orderedDict and from it, you can extract the data you want.

Bruno Mello
  • 4,448
  • 1
  • 9
  • 39