3

I cannot retrieve the gender field in the following xml using python. i had tried the following:

import xml.etree.ElementTree as ET
requests.get('http://www.librarything.com/services/rest/1.1/method=librarything.ck.getauthor&id=216&apikey=d231aa37c9b4f5d304a60a3d0ad1dad4')
root = ET.fromstring(req.text)
print(root.find(".//field[@type='5']"))

i am expecting to get the element. but i get "none"

<response stat="ok">
<ltml xmlns="http://www.librarything.com/" version="1.1">
<item id="216" type="author">
<author id="216" authorcode="clarkesusanna">...</author>
<url>http://www.librarything.com/author/216</url>
<commonknowledge>
<fieldList>
<field type="22" name="canonicalname" displayName="Canonical name">...</field>
<field type="20" name="biography" displayName="Short biography">...</field>
<field type="33" name="relationships" displayName="Relationships">...</field>
<field type="18" name="nationality" displayName="Nationality">...</field>
<field type="32" name="othernames" displayName="Other names">...</field>
<field type="17" name="occupations" displayName="Occupations">...</field>
<field type="9" name="education" displayName="Education">...</field>
<field type="6" name="placesofresidence" displayName="Places of residence">...</field>
<field type="44" name="birthplace" displayName="Birthplace">...</field>
<field type="31" name="legalname" displayName="Legal name">...</field>
<field type="4" name="awards" displayName="Awards and honors">...</field>
<field type="8" name="birthdate" displayName="Birthdate">...</field>
<field type="5" name="gender" displayName="Gender">
<versionList>
<version id="7537" archived="0" lang="eng">
<date timestamp="1191988667">Tue, 09 Oct 2007 23:57:47 -0400</date>
<person id="1496">
<name>felius</name>
<url>http://www.librarything.com/profile/felius</url>
</person>
<factList>
<fact>female</fact>
</factList>
</version>
</versionList>
</field>
</fieldList>
</commonknowledge>
</item>
<legal>
By using this data you agree to the LibraryThing API terms of service.
</legal>
</ltml>
</response>

XML page

can someone please help me understand what am i doing wrong?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
freal
  • 63
  • 5
  • 3
    Welcome to Stack Overflow! It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). The more detail you provide, the more answers you are likely to receive. – Martijn Pieters Aug 10 '14 at 16:27
  • 1
    Also, please include a *sample* of the XML **here**, not on an external site. – Martijn Pieters Aug 10 '14 at 16:28
  • thanks, i think my question is more specific now. – freal Aug 10 '14 at 16:39
  • @user3927351: Yes, this is now a good question. It would be still better if you stripped down the XML even further, and embedded it as a triple-quoted string in your sample code, so we could just copy and paste your example into a Python interpreter to make it easier to debug. But good enough for a +1 from me. – abarnert Aug 10 '14 at 19:06
  • You have a namespaced XML document, see [Parsing XML with namespace in Python ElementTree](http://stackoverflow.com/q/14853243) – Martijn Pieters Aug 10 '14 at 20:48
  • Also related (but using the `lxml` implementation of the ElementTree API): [Parse large XML with lxml](http://stackoverflow.com/q/16565995) – Martijn Pieters Aug 10 '14 at 20:53

1 Answers1

1

The first thing you should test is what happens if you simplify your XPath:

>>> print(root.find(".//field"))
None

So, what's going on? You don't have any elements of type field. You've got an explicit namespace, which means you have elements of type '{http://www.librarything.com/}field'. You can see this pretty easily:

>>> print(root.getchildren())
[<Element '{http://www.librarything.com/}item' at 0x1047580e8>]
>>> print(root.find(".//{http://www.librarything.com/}field"))
<Element '{http://www.librarything.com/}field' at 0x1047582c8>
>>> print(root.find(".//{http://www.librarything.com/}field[@type='5']"))
<Element '{http://www.librarything.com/}field' at 0x104758688>

If you want to know more, there are multiple questions on this site about how ETree deals with namespaces (from a quick search, 1 and 2 look relevant), and detailed information in the documentation; trying to explain it all in yet another answer would just lead to an inferior answer to the existing ones.

Community
  • 1
  • 1
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Thanks, that was my problem... now it works: root.find(".//{http://www.librarything.com/}field[@type='5']") – freal Aug 10 '14 at 21:07