0

The title is a mouthful but describes what I want. I am parsing through an XML with BeautifulSoup. The format of my XML is as follows:

<properties>
    <place>
        <house_id>12345</house_id>
        <appliances>Fridge, Oven</appliances>
        <price>350000</price>
    </place>
    <place>
        <house_id>6789</house_id>
        <appliances>Heater, Microwave, Fridge</appliances>
        <price>870000</price>
    </place>
</properties>

Given a specific value for the house_id tag, I want the text INSIDE of the appliances that correspond to that place. For instance, given 12345, I want to return Fridge, Oven. I have not found an easy way to do this yet with BeautifulSoup.

Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
user3611
  • 151
  • 2
  • 9

3 Answers3

1

You can use the General Sibling Combinator (~):

soup.select_one("house_id:-soup-contains('12345') ~ appliances").text

Or you can find the <house_id> tag containing specific text, and then call find_next() to locate the <appliances> tag:

print(soup.find("house_id", text="12345").find_next("appliances").text)
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • Thank you! Would this work if appliances was not directly the next tag, but maybe a few down or a few previous ones? I just gave dummy data to describe my question, in reality it wont be directly next, might be a few tags up or past it – user3611 Oct 20 '21 at 20:53
  • @user3611 I think it should work even if it's not directly next. If it's upwards, you can use `find_previous()` instead of `find_next()` – MendelG Oct 20 '21 at 20:54
0

Based on your input XML, the following XPath expression will produce what you need.

can we use XPath with BeautifulSoup?

XPath

/properties/place[house_id="12345"]/appliances
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
0

This will work even if <appliances> tag is either before or after the <house_id>.

Use findParent() to find the parent of <house_id> and then find the tag <appliances> in that parent.

Here is the code

from bs4 import BeautifulSoup

s = """
<properties>
    <place>
        <house_id>12345</house_id>
        <appliances>Fridge, Oven</appliances>
        <price>350000</price>
    </place>
    <place>
        <house_id>6789</house_id>
        <appliances>Heater, Microwave, Fridge</appliances>
        <price>870000</price>
    </place>
    <place>
        <appliances>Oven, Cleaner, Microwave</appliances>
        <price>700000</price>
        <house_id>1296</house_id>
    </place>
</properties>"""

soup = BeautifulSoup(s, 'xml')


def get_appliance(t, soup):
    h = soup.find('house_id', text=t)
    appliance = h.findParent().find('appliances')
    return appliance.text


print(get_appliance('12345', soup))
print(get_appliance('1296', soup))
Fridge, Oven
Oven, Cleaner, Microwave
Ram
  • 4,724
  • 2
  • 14
  • 22