I am trying to open and read a .docx file using Ruby, and extract portions of the text and objects/images and save into another (non .docx) file.
Using Nokogiri, I am able to properly extract text and do my partitioning of the document into the sections I want via:
zip = Zip::File.open file_path
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" =>
"http://schemas.openxmlformats.org/wordprocessingml/2006/main"})
If I do instead:
xml.root.xpath("//w:body", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"})
I can see the objects in the xml as:
<w:object w:dxaOrig="1440" w:dyaOrig="400">
<v:shapetype id="_x0000_t75" coordsize="21600,21600" o:spt="75" o:preferrelative="t" path="m@4@5l@4@11@9@11@9@5xe" filled="f" stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/>
<v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/>
</v:formulas>
<v:path o:extrusionok="f" gradientshapeok="t" o:connecttype="rect"/>
<o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype>
<v:shape id="_x0000_i1025" type="#_x0000_t75" style="width:1in;height:20.4pt" o:ole="">
<v:imagedata r:id="rId4" o:title=""/>
</v:shape>
<o:OLEObject Type="Embed" ProgID="Equation.DSMT4" ShapeID="_x0000_i1025" DrawAspect="Content" ObjectID="_1563800156" r:id="rId5"/>
</w:object>
but not sure how to convert that to something that can be later used to display in html. Converting to svg such that it could be displayed along with the text in html would be ideal.
Thanks for any help.