manipulate with ':' in xpath attribute

Question

Recently I was helping one of my friends to read docx contents and generate one new docx. I tried to make docx to zip by changing suffix, then in the 'word/document.xml' of the zip, the attributes like 'w:val' always have 'w:' prefix.

import lxml.etree as ET
tree = ET.parse(xml_file_origin)
for index,paragraph in tree.xpath("//p[bookmarkStart[contains(@w:name, '_bookmark')]]"):
  print(paragraph)

then ':' will make 'XPathEvalError: Invalid expression'. I tried finding one solution in cheatsheet but not found.

Could someone help me with how to manipulating this problem with ':' in xpath attribute

here the semicolon is in attribute not in element. (Sorry , I didn't get what the meaning of ‘namespace’ with 'attribute' when I receive the duplicate message). From this answer, namespace seems to be related with tag (also from official doc). Could someone tell me how to use namespace with 'attribute' (with code better)?

My xml main body which I forgot to post when first open the question:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture" xml:space="preserve">
  <w:body>
    <w:p>
      <w:pPr>
        <w:spacing w:before="61"/>
        <w:ind w:left="383" w:right="0" w:firstLine="0"/>
        <w:jc w:val="center"/>
        <w:rPr>
          <w:sz w:val="30"/>
        </w:rPr>
      </w:pPr>
      <w:bookmarkStart w:name="..." w:id="1"/>
      <w:bookmarkEnd w:id="1"/>
      <w:r>
        <w:rPr/>
      </w:r>
      <w:bookmarkStart w:name="_bookmark0" w:id="2"/>
      <w:bookmarkEnd w:id="2"/>
      <w:r>
        <w:rPr/>
      </w:r>
      <w:r>
        <w:rPr>
          <w:w w:val="95"/>
          <w:sz w:val="30"/>
        </w:rPr>
        <w:t>...</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

Sorry for asking such a newbie question, after reading this question and related comment, and then print the attrib to see what it is, then I found the following solution to search '_bookmark' in attrib 'w:name' of element 'w:bookmarkStart':

init_ns_ = {'w':"http://schemas.openxmlformats.org/wordprocessingml/2006/main"}
up_elem = tree_xml_.xpath("//w:document",namespaces=init_ns_)[0]
total_ns = up_elem.nsmap
for bookmarkname in root_.xpath('.//w:bookmarkStart/@w:name',namespaces=total_ns):
    print(bookmarkname)

manipulate with ':' in xpath attribute

0 Answers0