Extract elements from XML file using Python

Question

The link below gives us the list of ingredients in recipelist. I would like to extract the names of the ingredient and save it to another file using python. http://stream.massey.ac.nz/file.php/6087/Eva_Material/Tutorials/recipebook.xml

So far I have tried using the following code, but it gives me the complete recipe not the names of the ingredients:

from xml.sax.handler import ContentHandler
import xml.sax
import sys
def recipeBook(): 
    path = "C:\Users\user\Desktop"
    basename = "recipebook.xml"
    filename = path+"\\"+basename
    file=open(filename,"rt")
    # find contents 
    contents = file.read()

    class textHandler(ContentHandler):
      def characters(self, ch):
      sys.stdout.write(ch.encode("Latin-1"))
    parser = xml.sax.make_parser()
    handler = textHandler( )
    parser.setContentHandler(handler)
    parser.parse("C:\Users\user\Desktop\\recipebook.xml")



  file.close()

How do I extract the name of each ingredient and save them to another file?

Unrelated: You should look at `os.path` and raw strings rather than dealing with windows paths like that. — Daenyth, May 07 '12 at 01:35
Put the XML string as example in the question; it is easier to give specific answer when we can see what data you are working with. Your URL is inaccessible to those without a Massey ID. Also consider navigating the XML tree using [ElementTree](http://docs.python.org/library/xml.etree.elementtree.html). — daedalus, May 07 '12 at 04:28
PS: [This answer, using ElementTree](http://stackoverflow.com/a/5533742/1290420) should get you there. — daedalus, May 07 '12 at 04:58

score 3 · Answer 1 · answered Jan 14 '13 at 15:59

@Neha

I guess you have solved your request by now, here is a little piece I put together using the tutorial at http://lxml.de/tutorial.html. The XML file is saved in 'rough_data.xml'

import xml.etree.cElementTree as etree

xmlDoc = open('rough_data.xml', 'r')
xmlDocData = xmlDoc.read()
xmlDocTree = etree.XML(xmlDocData)

for ingredient in xmlDocTree.iter('ingredient'):
    print ingredient[0].text

To all experienced Python programmers reading this, kindly improve this "newbie" code.

Note: The lxml package looks very good, it definitely worth using. Thanks

omu_negru · Answer 2 · 2012-05-10T14:13:09.650

1

Please place the relevant XML text in order to receive a proper answer. Also please consider using lxml for anything xml specific (including html) .

try this :

from lxml import etree

tree=etree.parse("your xml here")
all_recipes=tree.xpath('./recipebook/recipe')
recipe_names=[x.xpath('recipe_name/text()') for x in all_recipes]
ingredients=[x.getparent().xpath('../ingredient_list/ingredients') for x in recipe_names]
ingredient_names=[x.xpath('ingredient_name/text()') for x in ingredients]

Here is the beginning only, but i think you get the idea from here -> get the parent from each ingredient_name and the ingredients/quantities from there and so on. You can't really do any other kind of search i think due to the structured nature of the document.

you can read more on [www.lxml.de]

edited May 10 '12 at 14:13

answered May 07 '12 at 06:59

omu_negru

4,642
4
27
38

Omelette eggs 2 milk 50 ml tomato 1 oil 1 ts Sandwich bread 2 slices butter – Neha Nalawade May 08 '12 at 08:08
thats the xml file I am working on and i just need the ingredients from the file i.e. milk, eggs etc. i need to extract the ingredients and write them to another file. thanks – Neha Nalawade May 08 '12 at 08:09

score 0 · Answer 3 · answered May 18 '12 at 22:22

Some time ago I made a series of screencasts that explain how to collect data from sites. The code is in Python and there are 2 videos about XML parsing with the lxml library. All the videos are posted here: http://railean.net/index.php/2012/01/27/fortune-cowsay-python-video-tutorial

The ones you want are:

XPath experiments and query examples
Python and LXML, examples of XPath queries with Python
Automate page retrieving via HTTP and HTML parsing with lxml

You'll learn how to write and test XPath queries, as well as how to run such queries in Python. The examples are straightforward, I hope you'll find them helpful.

Extract elements from XML file using Python

3 Answers3

Linked