1

I have the following Xml file:

'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\r\n<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14"><w:body><w:p w:rsidR="00706A37" w:rsidRPr="004A1CE5" w:rsidRDefault="004A1CE5"><w:pPr><w:pStyle w:val="Heading1"/><w:numPr><w:ilvl w:val="12"/><w:numId w:val="0"/></w:numPr><w:rPr><w:sz w:val="28"/><w:szCs w:val="28"/></w:rPr></w:pPr><w:commentRangeStart w:id="0"/><w:r w:rsidRPr="004A1CE5"><w:rPr><w:sz w:val="28"/><w:szCs w:val="28"/></w:rPr><w:t>H</w:t></w:r><w:commentRangeEnd w:id="0"/><w:r w:rsidR="00A23794"><w:rPr><w:rStyle w:val="CommentReference"/> 

And I need to extract the value of id within a <w:commentRangeStart> tag . I have looked over many questions on SO and found the following type:

I tried: (iterate over every p with a commentRangeStart tag , and retrieve attrib. This returned nothing.

for p in lxml_tree.xpath('.//w:p/commentRangeStart',namespaces = {'w':w}):
    print p.attrib

I tried various combinations with 'commentRangeStart[@id]' and commentRangeStart/@id but None worked. I referred to many questions and one of them is here .
I would prefer a way in which it would go over every p and then search for the comment tag. Like:

for p in lxml_tree.xpath('.//w:p',namespaces = {'w':w}):  
    p.xpath(./w:commentRangeStart/...)

and so on..

What's wrong with my expression.??

Community
  • 1
  • 1
Hypothetical Ninja
  • 3,920
  • 13
  • 49
  • 75

1 Answers1

2

You need to qualify namespace:

for p in root.xpath('.//w:p/w:commentRangeStart', namespaces={'w':w}):
    print p.attrib

output:

{'{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id': '0'}

Alternative:

for id_ in root.xpath('.//w:p/w:commentRangeStart/@w:id', namespaces={'w': w}):
    print id_

output:

0
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • can i replace the extra w's with // ?? – Hypothetical Ninja Oct 07 '14 at 13:51
  • @Swordy, If you replace it, you will get empty element list. – falsetru Oct 07 '14 at 13:52
  • could you add the code for a 2 level iteration ? I tried this: for p in lxml_tree.xpath('.//w:p/w', namespaces={'w':w}): for id_ in p.xpath('./w:commentRangeStart/@w:id',namespaces={'w':w}): print id_ but doesn't print anything.. – Hypothetical Ninja Oct 07 '14 at 13:55
  • the reason is that I need to carry out an operation for every p irrespective of whether it has a comment tag or not.. – Hypothetical Ninja Oct 07 '14 at 13:57
  • @Swordy, http://asciinema.org/a/12781 (BTW, the given xml is not complete. So I added closing tags.) – falsetru Oct 07 '14 at 13:57
  • @Swordy, Do you mean? `for id_ in root.xpath('.//w:p//w:commentRangeStart/@w:id', namespaces={'w': w}): print id_` – falsetru Oct 07 '14 at 14:00
  • @Swordy, Or `for p in root.xpath('.//w:p', namespaces={'w': w}): for id_ in root.xpath('.//w:commentRangeStart/@w:id', namespaces={'w': w}): print id_` – falsetru Oct 07 '14 at 14:01
  • yeah it was too big , just used a sample here.. How should I implement a 2 level iteration? that is , over every p , then find comment tags.. No , my comment used 2 for loops. yes ur 2nd comment. – Hypothetical Ninja Oct 07 '14 at 14:01