So, recently I had to create a XML to JSON converter. It doesn't conform exactly to the JSON standard, but it comes pretty close. The xml2json function returns a dictionary representation of the xml object. All element attributes are included in a dictionary with a key of attributes and element text are included in the text key.
For example, your xml object would look like this after its conversion:
json = {'elements':
{'elem': [
{'attributes': {'id', '1'}, 'text': 'some element'},
{'attributes': {'id', '2'}, 'text': 'some other element'},
{'attributes': {'id', '3'}, 'text': 'some element', 'nested': {
'attributes': {'id', '1'}, 'text': 'other nested element'}},
]}
Here is the xml2json function.
def xml2json(x):
def get_attributes(atts):
atts = dict(atts)
d = {}
for k, v in atts.items():
d[k] = v.value
return d
def get_children(n, d):
tmp = {}
d.setdefault(n.nodeName, {})
if n.attributes:
tmp['attributes'] = get_attributes(n.attributes)
if n.hasChildNodes():
for c in n.childNodes:
if c.nodeType == c.TEXT_NODE or c.nodeName == '#cdata-section':
tmp['text'] = c.data
else:
children = get_children(c, {})
for ck, cv in children.items():
if ck in d[n.nodeName]:
if not isinstance(d[n.nodeName][ck], list):
tmpv = d[n.nodeName][ck]
d[n.nodeName][ck] = []
d[n.nodeName][ck].append(tmpv)
d[n.nodeName][ck].append(cv)
else:
d[n.nodeName][ck] = cv
for tk, tv in tmp.items():
d[n.nodeName][tk] = tv
return d
return get_children(x.firstChild, {})
Here is the searchjson function.
def searchjson(sobj, reg):
import re
results = []
if isinstance(sobj, basestring):
# search the string and return the output
if re.search(reg, sobj):
results.append(sobj)
else:
# dictionary
for k, v in sobj.items():
newv = v
if not isinstance(newv, list):
newv = [newv]
for elem in newv:
has_attributes = False
if isinstance(elem, dict):
has_attributes = bool(elem.get('attributes', False))
res = searchjson(elem, reg)
res = [] if not res else res
for r in res:
r_is_dict = isinstance(r, dict)
r_no_attributes = r_is_dict and 'attributes' not in r.keys()
if has_attributes and r_no_attributes :
r.update({'attributes': elem.get('attributes', {})})
results.append({k: r})
return results
The search function I created after reading your question. It hasn't been 100% tested and probably has a few bugs, but I think it would be a good start for you. As for what you're looking for, it searches nested elements, attributes, using wildcards. It also returns the id of the elements.
You can use the function like so, where xml is the xml object to search and reg is a regex pattern string to search for, ex: 'other', 'oth.*', '.the.' will all find the elements with "other" in them.
json = xml2json(xml)
results = searchjson(json, reg='other')
results will be a list of dictionaries.
Hope it helps.