0

I have an xml file that looks like this:

<elements>
   <topic id1=111 id2=222>
      <title>title1</title>
      <topic id1=333 id2=444>
         <title>title2</title>
      </topic>
      <topic id1=555 id2=666>
         <title>title3</title>
      </topic>
   </topic>
   <topic id1=777 id2=888>
      <title>title3</title>
   </topic>
</elements>

I need as an output of this all the text in the title elements and all the id1 and id2 atributes, like this:

[[title1,111,222],[title2,333,444],...]

I'll put them later on in a csv file, but I know how to that, I'm quite stuck in this one. I have seen posts like this one, but I don't seem to hang of it to get the information from all of them at once. I'm using python 3.3 just in case. Any ideas are greatly appreciated.

Thanks!!

Community
  • 1
  • 1
rodrigocf
  • 1,951
  • 13
  • 39
  • 62
  • I really don't know if it can be considered duplicate because of all the child elements involved and the required iteration (?). – rodrigocf Jul 01 '14 at 23:40
  • Well it was either that or vote to close because you didn't try anything. Either way the answer to "how do I read XML in Python" is in the answer you linked. – Cfreak Jul 01 '14 at 23:41
  • Not a valid xml file element... – dawg Jul 02 '14 at 05:29

1 Answers1

1

Python 2.7.5

text = '''<elements>
   <topic id1=111 id2=222>
      <title>title1</title>
      <topic id1=333 id2=444>
         <title>title2</title>
      </topic>
      <topic id1=555 id2=666>
         <title>title3</title>
      </topic>
   </topic>
   <topic id1=777 id2=888>
      <title>title3</title>
   </topic>
</elements>'''

import BeautifulSoup as bs

results = []

soup = bs.BeautifulSoup(text)
topics = soup.findAll('topic')

for x in topics:
    e = []
    e.append(x.find('title').text)
    e.extend( a[1] for a in x.attrs )
    results.append(e)

print results

[[u'title1', u'111', u'222'], [u'title2', u'333', u'444'], [u'title3', u'555', u'666'], [u'title3', u'777', u'888']]
furas
  • 134,197
  • 12
  • 106
  • 148