0

I have a .docx document where one of the XWPFRun run elements contains mc:AlternateContent and mc:Fallback elements. Are there any way to get poi 5 to parse the content of the mc:Fallback element?

If I understand the .docx format correct, then the parser should read the content of mc:Fallback and use that content instead of the content of mc:AlternateContent but poi does not do that as far as I can see. It just ignore the entire <mc:AlternateContent> tag including the fallback.

So my question "Is there any way to get poi to parse the xml in the mc:Fallback" ?

If this is not possible, I guess I will have to parse the xml myself for this special situation myself. But then I come with the problem: How do I do that? The XWPFRun class does not have a method to get all childern, so there does not seem to be a way to traverse the tree or parse the xml directly?.

The xml for the run is:

<xml-fragment xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex" xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex" xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex" xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex" xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex" xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex" xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex" xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex" xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink" xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml" xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex" xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid" xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml" xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
  <w:rPr>
    <w:rFonts w:ascii="Avenir Book" w:eastAsia="Avenir Book" w:hAnsi="Avenir Book" w:cs="Avenir Book"/>
    <w:noProof/>
  </w:rPr>
  <mc:AlternateContent>
    <mc:Choice Requires="wps">
      <w:drawing>
        <wp:anchor distT="152400" distB="152400" distL="152400" distR="152400" simplePos="0" relativeHeight="251663360" behindDoc="0" locked="0" layoutInCell="1" allowOverlap="1" wp14:anchorId="2C934A43" wp14:editId="2994BC37">
          <wp:simplePos x="0" y="0"/>
          <wp:positionH relativeFrom="margin">
            <wp:posOffset>3060028</wp:posOffset>
          </wp:positionH>
          <wp:positionV relativeFrom="line">
            <wp:posOffset>250486</wp:posOffset>
          </wp:positionV>
          <wp:extent cx="3175000" cy="1625600"/>
          <wp:effectExtent l="0" t="0" r="0" b="0"/>
          <wp:wrapThrough wrapText="bothSides" distL="152400" distR="152400">
            <wp:wrapPolygon edited="1">
              <wp:start x="-43" y="-84"/>
              <wp:lineTo x="-43" y="0"/>
              <wp:lineTo x="-43" y="21600"/>
              <wp:lineTo x="-43" y="21684"/>
              <wp:lineTo x="0" y="21684"/>
              <wp:lineTo x="21600" y="21684"/>
              <wp:lineTo x="21643" y="21684"/>
              <wp:lineTo x="21643" y="21600"/>
              <wp:lineTo x="21643" y="0"/>
              <wp:lineTo x="21643" y="-84"/>
              <wp:lineTo x="21600" y="-84"/>
              <wp:lineTo x="0" y="-84"/>
              <wp:lineTo x="-43" y="-84"/>
            </wp:wrapPolygon>
          </wp:wrapThrough>
          <wp:docPr id="1073741830" name="officeArt object"/>
          <wp:cNvGraphicFramePr/>
          <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
            <a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
              <wps:wsp>
                <wps:cNvSpPr txBox="1"/>
                <wps:spPr>
                  <a:xfrm>
                    <a:off x="0" y="0"/>
                    <a:ext cx="3175000" cy="1625600"/>
                  </a:xfrm>
                  <a:prstGeom prst="rect">
                    <a:avLst/>
                  </a:prstGeom>
                  <a:noFill/>
                  <a:ln w="12700" cap="flat">
                    <a:solidFill>
                      <a:srgbClr val="000000"/>
                    </a:solidFill>
                    <a:prstDash val="solid"/>
                    <a:miter lim="400000"/>
                  </a:ln>
                  <a:effectLst/>
                </wps:spPr>
                <wps:txbx>
                  <w:txbxContent>
                    <w:p w14:paraId="6BDD3C3F" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="00DB66DB">
                      <w:pPr>
                        <w:pStyle w:val="Brdtekst"/>
                        <w:rPr>
                          <w:rFonts w:ascii="Proxima Nova" w:eastAsia="Proxima Nova" w:hAnsi="Proxima Nova" w:cs="Proxima Nova"/>
                          <w:b/>
                          <w:bCs/>
                        </w:rPr>
                      </w:pPr>
                      <w:r>
                        <w:rPr>
                          <w:rFonts w:ascii="Proxima Nova" w:hAnsi="Proxima Nova"/>
                          <w:b/>
                          <w:bCs/>
                        </w:rPr>
                        <w:t xml:space="preserve">FAKTA </w:t>
                      </w:r>
                    </w:p>
                    <w:p w14:paraId="215D2258" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="004221F0">
                      <w:pPr>
                        <w:pStyle w:val="Brdtekst"/>
                        <w:rPr>
                          <w:rFonts w:ascii="Proxima Nova" w:eastAsia="Proxima Nova" w:hAnsi="Proxima Nova" w:cs="Proxima Nova"/>
                          <w:b/>
                          <w:bCs/>
                        </w:rPr>
                      </w:pPr>
                    </w:p>
                    <w:p w14:paraId="0CFB3B5A" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="00DB66DB">
                      <w:pPr>
                        <w:pStyle w:val="Brdtekst"/>
                      </w:pPr>
                      <w:r>
                        <w:rPr>
                          <w:rFonts w:ascii="Proxima Nova" w:hAnsi="Proxima Nova"/>
                        </w:rPr>
                        <w:t>Here is the text of the document</w:t>
                      </w:r>
                    </w:p>
                  </w:txbxContent>
                </wps:txbx>
                <wps:bodyPr wrap="square" lIns="50800" tIns="50800" rIns="50800" bIns="50800" numCol="1" anchor="t">
                  <a:noAutofit/>
                </wps:bodyPr>
              </wps:wsp>
            </a:graphicData>
          </a:graphic>
        </wp:anchor>
      </w:drawing>
    </mc:Choice>
    <mc:Fallback>
      <w:pict>
        <v:shapetype w14:anchorId="2C934A43" id="_x0000_t202" coordsize="21600,21600" o:spt="202" path="m,l,21600r21600,l21600,xe">
          <v:stroke joinstyle="miter"/>
          <v:path gradientshapeok="t" o:connecttype="rect"/>
        </v:shapetype>
        <v:shape id="officeArt object" o:spid="_x0000_s1026" type="#_x0000_t202" style="position:absolute;margin-left:240.95pt;margin-top:19.7pt;width:250pt;height:128pt;z-index:251663360;visibility:visible;mso-wrap-style:square;mso-wrap-distance-left:12pt;mso-wrap-distance-top:12pt;mso-wrap-distance-right:12pt;mso-wrap-distance-bottom:12pt;mso-position-horizontal:absolute;mso-position-horizontal-relative:margin;mso-position-vertical:absolute;mso-position-vertical-relative:line;v-text-anchor:top" wrapcoords="-47 -84 -47 0 -47 21600 -47 21684 -4 21684 21596 21684 21639 21684 21639 21600 21639 0 21639 -84 21596 -84 -4 -84 -47 -84" o:gfxdata="UEsDBBQABgAIAAAAIQC2gziS/gAAAOEBAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbJSRQU7DM..." filled="f" strokeweight="1pt">
          <v:stroke miterlimit="4"/>
          <v:textbox inset="4pt,4pt,4pt,4pt">
            <w:txbxContent>
              <w:p w14:paraId="6BDD3C3F" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="00DB66DB">
                <w:pPr>
                  <w:pStyle w:val="Brdtekst"/>
                  <w:rPr>
                    <w:rFonts w:ascii="Proxima Nova" w:eastAsia="Proxima Nova" w:hAnsi="Proxima Nova" w:cs="Proxima Nova"/>
                    <w:b/>
                    <w:bCs/>
                  </w:rPr>
                </w:pPr>
                <w:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Proxima Nova" w:hAnsi="Proxima Nova"/>
                    <w:b/>
                    <w:bCs/>
                  </w:rPr>
                  <w:t xml:space="preserve">FAKTA </w:t>
                </w:r>
              </w:p>
              <w:p w14:paraId="215D2258" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="004221F0">
                <w:pPr>
                  <w:pStyle w:val="Brdtekst"/>
                  <w:rPr>
                    <w:rFonts w:ascii="Proxima Nova" w:eastAsia="Proxima Nova" w:hAnsi="Proxima Nova" w:cs="Proxima Nova"/>
                    <w:b/>
                    <w:bCs/>
                  </w:rPr>
                </w:pPr>
              </w:p>
              <w:p w14:paraId="0CFB3B5A" w14:textId="77777777" w:rsidR="004221F0" w:rsidRDefault="00DB66DB">
                <w:pPr>
                  <w:pStyle w:val="Brdtekst"/>
                </w:pPr>
                <w:r>
                  <w:rPr>
                    <w:rFonts w:ascii="Proxima Nova" w:hAnsi="Proxima Nova"/>
                  </w:rPr>
                  <w:t>Here is the text of the document</w:t>
                </w:r>
              </w:p>
            </w:txbxContent>
          </v:textbox>
          <w10:wrap type="through" anchorx="margin" anchory="line"/>
        </v:shape>
      </w:pict>
    </mc:Fallback>
  </mc:AlternateContent>
</xml-fragment>
MTilsted
  • 5,425
  • 9
  • 44
  • 76
  • 1
    a [while back](https://bz.apache.org/bugzilla/show_bug.cgi?id=61939), I did some work on it. Are you able to get the above fragment via [getCTR](https://poi.apache.org/apidocs/dev/org/apache/poi/xwpf/usermodel/XWPFRun.html#getCTR--)? when recursing down, e.g. with the help of XmlCursor, is the content of the fallback element of type XmlAnyType? – kiwiwings Jun 21 '21 at 22:00
  • @kiwiwings Yes that fragment is the output of run.getCTR().toString() where run is a XWPFRun. I got a XmlCursor from paragraph.getCTP().newCursor() but the only way to iterate over that is by calling getDomNode() and that just gives me all the xml nodes which does match what is shown in the fragment. So I am not sure about what you mean with XmlAnyType. Is there a way to convert the dom node back to a class managed by poi? – MTilsted Jun 21 '21 at 22:45
  • 1
    What exactly are you trying to do? See https://stackoverflow.com/questions/46802369/replace-text-in-text-box-of-docx-by-using-apache-poi/46894499#46894499 for how to get/use the text runs within a text box. This gets text runs in text boxes from `mc:AlternateContent` as well as from `mc:Fallback`. – Axel Richter Jun 22 '21 at 04:30
  • @AxelRichter I am trying to import the text, while hopefully preserving as much semantic(Such as the fact that it's a floating text box). And that example looks useful, but it don't work with poi 5.0.0 because CTR.Factory.parse(obj.xmlText()) returns an XmlAnyTypeImpl which I can't use as argument to the contsructor for XWPFRun – MTilsted Jun 23 '21 at 19:07
  • Tried my code using `apache poi 5.0.0`. It worked without any problems. – Axel Richter Jun 24 '21 at 06:44
  • @AxelRichter Just wanted to say thank you. The reason it did not initial work for me at all was that I used xmlbeans-5.0.0.jar instead of xmlbeans-4.0.0.jar Anything started to work once I switched to xmlbeans-4.0.0.jar – MTilsted Aug 07 '21 at 13:34

0 Answers0