0

I am been trying to implement the logic , But not getting how do I start

My xml as below data in which if you see the child root as 3 attributes If any one of the attribute is missing it should be populated as null in that case

Code :

import xml.etree.ElementTree as ET

xml_data='''
<job_details>
    <role>
        <name>Vilan</name>
        <salary>$5.95</salary>
        <job_description>Developer</job_description>
    </role>
    <role>
        <name>Dils</name>
        <salary>$7.95</salary>
    </role>
    <role>
        <name>Raj</name>
        <job_description>Fullstack Developer</job_description>
    </role>
</job_details>
'''

get_root_element = ET.fromstring(xml_data)

out = []
for item in get_root_element:
    res = ','.join(x.text for x in item)
    out.append(res)

Expected output

['Vilan,$5.95,Developer', 'Dils,$7.95,null' , 'Raj',null,'Fullstack Developer'  ]
kiric8494
  • 195
  • 1
  • 7

1 Answers1

1

Here is how you can do that. You need to define all fields and then just search for them.

EDIT As in the comments discussed I changed your xml_data in the 3rd role, name field exists now but it doesn`t have a value.

import xml.etree.ElementTree as ET

xml_data = """
<job_details>
    <role>
        <name>Vilan</name>
        <salary>$5.95</salary>
        <job_description>Developer</job_description>
    </role>
    <role>
        <name>Dils</name>
        <salary>$7.95</salary>
    </role>
    <role>
        <name></name>
        <job_description>Fullstack Developer</job_description>
    </role>
</job_details>
"""
out = []
get_root_element = ET.fromstring(xml_data)
all_fields = set([i.tag for role in get_root_element for i in role])
for item in get_root_element:
    tmp = []
    for fields in all_fields:
        res = item.find(fields)
        if res is not None:
            if res.text is not None:
                tmp.append(res.text)
            else:
                tmp.append("null")
        else:
            tmp.append("null")
    out.append(",".join(tmp))

print(out)

['Vilan,$5.95,Developer', 'Dils,$7.95,null', 'null,null,Fullstack Developer']
Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • Awesome let me understand the approach how you implemented it – kiric8494 May 22 '22 at 13:33
  • Is the code going to work for this sceanrio like their is not value for attribute like following : `` -> it should populate null – kiric8494 May 22 '22 at 13:35
  • in your first attempt we just looped through them fields and took the `field.text`. Since they can be missing but you need to know which one is missing you need to define all possible fields as list. see [documentation](https://docs.python.org/3/library/xml.etree.elementtree.html) how to access values when searching by their field name. If the element exists, you append its value, if it doesn't exist `res` will return `None` and you can add whatever value you want (here `null`). – Rabinzel May 22 '22 at 13:36
  • @R It cannot handle this ` Bob ` any solution for this – kiric8494 May 22 '22 at 13:39
  • I'm just debugging what happens when the field exists, but there is no value in it. – Rabinzel May 22 '22 at 13:39
  • : If attribute exist but no value in it how to consider it as null in that case – kiric8494 May 22 '22 at 13:41
  • I'll update my code – Rabinzel May 22 '22 at 13:49
  • : Ya I did the same thing thanks for the approach. I will be posting few more question related to xml as an when I will encounter problem. Will need your help on this topic. please stay connected on this – kiric8494 May 22 '22 at 13:53
  • Instead of hardcoding the attributes in list we can make it dynamic right ? `all_fields = ["name", "salary", "job_description"] ` – kiric8494 May 22 '22 at 13:55
  • I was searching for something where to get like a list of unique fields based on the element tree but i couldn't find something. Maybe there is something I don't know of. What you can definitely do is loop twice through your tree. the first time you just make a set of all fields that occur and in the second step you search for them. Not the most elegant way but it would work – Rabinzel May 22 '22 at 13:58
  • : I have updated the code to get dynamic headers : `all_fields = [ i.tag for i in get_root_element[0]]`. considering only first child root element – kiric8494 May 22 '22 at 13:59
  • yes, but be aware that this only works if you know that the first `` has all fields. – Rabinzel May 22 '22 at 14:03
  • I too agree with you …their is always needed a manual check to insure that all attributes are covered – kiric8494 May 22 '22 at 14:04
  • I change `all_fields` again so you don't need to manually check. see updated answer – Rabinzel May 22 '22 at 14:08
  • : I too did the same thing … getting all tag and doing set to get distinct ... ha ha ha aaa. Thanks – kiric8494 May 22 '22 at 14:14
  • 1
    reagarding to your edit: sorry, but this is not pythonic and shouldn't be done like this. It doesn't help the user to understand. The one line list comprehension is more self-explanatory then nested for loops with indexes ranges and lengths. please have a look [here](https://stackoverflow.com/q/19184335/15521392) to get a better understanding what you are trying to achieve – Rabinzel May 22 '22 at 14:23