Questions tagged [docx]

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. Use this tag when you are working with .docx files programmatically, such as generating .docx, extracting data from .docx or editing a .docx

.docx is the file extension for files created using the default format of Microsoft Word 2007 or higher. This is the Microsoft Office Open XML WordProcessingML format. This format is based around a zipped collection of eXtensible Markup Language (XML) files. Microsoft Office Open XML WordProcessingML is mostly standardized in ECMA 376 and ISO 29500.

Formerly, Microsoft used the BIFF (Binary Interchange File Format) binary format (.xls, .doc, .ppt). It now uses the OOXML (Office Open XML) format. These files (.xlsx, .xlsm, .docx, .docm, .pptx, .pptm) are zipped-XML.

.docx is the new default Word format, it cannot contain any VBA (for security reasons as stated by Microsoft).
.docm is the new Word format that can store VBA and execute macros.

The .docx format is a zipped file that contains the following folders:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

The main content of a docx file resides in word/document.xml.

A typical word/document.xml looks like this :

<w:body>
  <w:p w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidRDefault="0059122C" w:rsidP="0059122C">
    <w:r>
      <w:t>Hello </w:t>
    </w:r>
    <w:proofErr w:type="spellStart"/>
    <w:r w:rsidR="008B4316">
      <w:t>W</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd"/>
    <w:r>
      <w:t>orld</w:t>
    </w:r>
    <w:bookmarkStart w:id="0" w:name="_GoBack"/>
    <w:bookmarkEnd w:id="0"/>
  </w:p>
  <w:sectPr w:rsidR="001A6335" w:rsidRPr="0059122C" w:rsidSect="001A6335">
    <w:headerReference w:type="default" r:id="rId7"/>
    <w:footerReference w:type="default" r:id="rId8"/>
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:docGrid w:linePitch="360"/>
  </w:sectPr>
</w:body>

The tags are w:body (for the whole document), and then the document is separated in multiple w:p (paragraphs). And a w:sectPr, which defines the headers/footers used for that document.

Inside a w:p, there are multiple w:r (runs). Every run defines its own style (color of the text, font-size, ...), and every run contains multiple w:t (text parts).

As you can see, a simple sentence like Hello World might be separated in multiple w:t, which makes templating quite difficult to implement.

3020 questions

113

votes

16 answers

Is there a Java API that can create rich Word documents?

I have a new app I'll be working on where I have to generate a Word document that contains tables, graphs, a table of contents and text. What's a good API to use for this? How sure are you that it supports graphs, ToCs, and tables? What are some…

asked Oct 14 '08 at 23:09

billjamesdev

14,554
6
53
76

votes

10 answers

How can I search a word in a Word 2007 .docx file?

I'd like to search a Word 2007 file (.docx) for a text string, e.g., "some special phrase" that could/would be found from a search within Word. Is there a way from Python to see the text? I have no interest in formatting - I just want to classify…

python ms-word openxml docx

asked Sep 22 '08 at 17:08

Gerry

1,303
1
10
16

votes

4 answers

Markdown to docx, including complex template

I have automated my build to convert Markdown files to DOCX files using Pandoc. I have even used a reference document for the final document's styling. The command I use is: pandoc -f markdown -t docx --data-dir=docs/rendering/ mydoc.md -o…

markdown docx pandoc

asked Jan 10 '13 at 02:12

Synesso

37,610
35
136
207

votes

6 answers

How to extract just plain text from .doc & .docx files?

Anyone know of anything they can recommend in order to extract just the plain text from a .doc or .docx? I've found this - wondered if there were any other suggestions?

unix extract docx doc text-extraction

asked Apr 15 '11 at 03:12

docextract

votes

5 answers

How to extract text from word file .doc,docx,.xlsx,.pptx php

There may be a scenario we need to get the text from word documents for the future use to search the string in the document uploaded by user like for searching in cv's/resumes and occurs a common problem that how to get the text , Open and read a…

php ms-word docx .doc

asked Oct 21 '13 at 20:03

M Khalid Junaid

63,861
10
90
118

votes

7 answers

Version control for DOCX and PDF?

I've been playing around with git and hg lately and then suddenly it occurred to me that this kind of thing will be great for documents. I've a document which I edit in DOCX and export as PDF. I tried using both git and hg to version control it and…

version-control pdf docx

asked Jul 21 '10 at 11:06

Jungle Hunter

7,233
11
42
67

votes

4 answers

How to zip a WordprocessingML folder into readable docx

I have been trying to write a simple Markdown -> docx parser/writer, but am completely stuck with the last part, which should be the easiest: i.e. compressing the folder into a .docx that Word, or any other .docx reader, will recognize. My…

xml compression markdown docx

asked Oct 03 '09 at 15:29

Michael

votes

5 answers

Inserting Image into DocX using OpenXML and setting the size

I am using OpenXML to insert an image into my document. The code provided by Microsoft works, but makes the image much smaller: public static void InsertAPicture(string document, string fileName) { using (WordprocessingDocument…

c# image openxml docx

asked Nov 10 '11 at 16:36

LunchMarble

5,079
9
64
94

votes

6 answers

Knitr & Rmarkdown docx tables

When using knitr and rmarkdown together to create a word document you can use an existing document to style the output. For example in my yaml header: output: word_document: reference_docx: style.docx fig_caption: TRUE within this style…

r knitr r-markdown docx pandoc

asked Jun 07 '16 at 06:20

zacdav

4,603
2
16
37

votes

2 answers

How can I create a simple docx file with Apache POI?

I'm searching for a simple example code or a complete tutorial how to create a docx file with Apache POI and its underlying openxml4j. I tried the following code (with a lot of help from the Content Assist, thanks Eclipse!) but the code does not…

java docx apache-poi

asked Apr 07 '10 at 12:59

guerda

23,388
27
97
146

votes

3 answers

Chrome says: "Resource interpreted as Document but transferred with MIME type application/vnd.openxmlformats-officedocument.wordprocessingml.document"

I am offering a file for download from my site, which is working. However, I am noticing this behavior from Chrome. I think I have the correct MIME Type set but Chrome is showing this message and also marks the request in red. The MIME type I have…

google-chrome download mime-types docx

asked Sep 24 '14 at 21:57

Michael

3,568
3
37
50

votes

6 answers

Converting docx to pdf with pure python (on linux, without libreoffice)

I'm dealing with a problem trying to develop a web-app, part of which converts uploaded docx files to pdf files (after some processing). With python-docx and other methods, I do not require a windows machine with word installed, or even libreoffice…

python pdf docx pythonanywhere python-docx

asked Jun 22 '18 at 06:39

Ofer Sadan

11,391
5
38
62

votes

8 answers

Append multiple DOCX files together

I need to use C# programatically to append several preexisting docx files into a single, long docx file - including special markups like bullets and images. Header and footer information will be stripped out, so those won't be around to cause any…

c# openxml docx

asked Oct 29 '08 at 17:22

ShootTheCore

votes

8 answers

Add styling rules in pandoc tables for odt/docx output (table borders)

I'm generating some odt/docx reports via markdown using knitr and pandoc and am now wondering how you'd go about formating tables. Primarily I'm interested in adding rules (at least top, bottom and one below the header, but being able to add…

docx pandoc odt

asked Jul 25 '13 at 12:53

Tilo Wiklund

votes

9 answers

Why are .docx files being corrupted when downloading from an ASP.NET page?

I have this following code for bringing page attachments to the user: private void GetFile(string package, string filename) { var stream = new MemoryStream(); try { using (ZipFile zip = ZipFile.Read(package)) { …

asp.net httpresponse docx

asked Mar 19 '10 at 13:21

Victor Rodrigues

11,353
23
75
107

2 3

…

99 100 Next