0

My code is shown below. It ingests XML from here: https://www.sec.gov/Archives/edgar/data/1413909/000149315218018055/dsgt-20180930.xml.

I would like to create a dictionary from keys and values in 'xbrli:xbrl' - i.e. create a dictionary from the keys and values shown in the second block of code below.

However, my code returns an empty dictionary. It completely skips xbrli:xbrl and goes directly to link:schemaRef.

import requests
import pandas as pd
import urllib.request  as urllib2
import xml.etree.ElementTree as ET
from lxml import etree

def namespaces(url):
    tree = ET.parse(urllib2.urlopen(url))
    root = tree.getroot()
    d = dict(root.attrib)
    return d.keys()

I would like to create a dictionary from this:

<xbrli:xbrl
  xmlns:xbrli="http://www.xbrl.org/2003/instance"
  xmlns:DSGT="http://dsgtag.com/20180930"
  xmlns:country="http://xbrl.sec.gov/country/2017-01-31"
  xmlns:currency="http://xbrl.sec.gov/currency/2017-01-31"
  xmlns:dei="http://xbrl.sec.gov/dei/2018-01-31"
  xmlns:iso4217="http://www.xbrl.org/2003/iso4217"
  xmlns:link="http://www.xbrl.org/2003/linkbase"
  xmlns:nonnum="http://www.xbrl.org/dtr/type/non-numeric"
  xmlns:num="http://www.xbrl.org/dtr/type/numeric"
  xmlns:ref="http://www.xbrl.org/2006/ref"
  xmlns:srt="http://fasb.org/srt/2018-01-31"
  xmlns:us-gaap="http://fasb.org/us-gaap/2018-01-31"
  xmlns:us-roles="http://fasb.org/us-roles/2018-01-31"
  xmlns:xbrldi="http://xbrl.org/2006/xbrldi"
  xmlns:xbrldt="http://xbrl.org/2005/xbrldt"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>...</xbrli:xbrl>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • there is no "direct conversion" in between XML and dict(), it could not detect it smartly, you need to know the layout, and expected output, then to expand, it cann't support "endless layer / unknown datatype" – Jack Wu Mar 07 '19 at 06:36
  • you might need to check https://docs.python.org/2/library/xml.etree.elementtree.html that talk about namespace – Jack Wu Mar 07 '19 at 06:54
  • You seem to be asking the same as in https://stackoverflow.com/q/42320779/407651 – mzjn Mar 07 '19 at 07:18
  • The sample you have posted is not a dict. If you want to have a dict as output - post the dict structure. Example: `{'key':'value'}` – balderman Mar 07 '19 at 08:35
  • I guess you want to automatically make a dictionary of namespace URIs and prefixes, so can use them in your XPath expressions but you don't have to hard-code them? That's not such a good idea. Namespace URIs are meant to be hard-coded. – Tomalak Mar 07 '19 at 09:16

1 Answers1

0

The solution is based on ET iterparse.

from io import StringIO
import xml.etree.ElementTree as ET
import requests
from pprint import pprint

r = requests.get('https://www.sec.gov/Archives/edgar/data/1413909/000149315218018055/dsgt-20180930.xml')
if r.status_code == 200:
    xml_data = unicode(r.content, "utf-8")
    document_namespaces = dict([node for _, node in ET.iterparse(StringIO(xml_data), events=['start-ns'])])
    pprint(document_namespaces)

Output

{u'DSGT': 'http://dsgtag.com/20180930',
 u'country': 'http://xbrl.sec.gov/country/2017-01-31',
 u'currency': 'http://xbrl.sec.gov/currency/2017-01-31',
 u'dei': 'http://xbrl.sec.gov/dei/2018-01-31',
 u'iso4217': 'http://www.xbrl.org/2003/iso4217',
 u'link': 'http://www.xbrl.org/2003/linkbase',
 u'nonnum': 'http://www.xbrl.org/dtr/type/non-numeric',
 u'num': 'http://www.xbrl.org/dtr/type/numeric',
 u'ref': 'http://www.xbrl.org/2006/ref',
 u'srt': 'http://fasb.org/srt/2018-01-31',
 u'us-gaap': 'http://fasb.org/us-gaap/2018-01-31',
 u'us-roles': 'http://fasb.org/us-roles/2018-01-31',
 u'xbrldi': 'http://xbrl.org/2006/xbrldi',
 u'xbrldt': 'http://xbrl.org/2005/xbrldt',
 u'xbrli': 'http://www.xbrl.org/2003/instance',
 u'xlink': 'http://www.w3.org/1999/xlink',
 u'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}
balderman
  • 22,927
  • 7
  • 34
  • 52