5

Pandoc doesn't render well HTML tables into docx documents. I get the content of a request, I render it using a template file. Then I use pypandoc like this:

 response = render(                                     
   request,                                      
   'template.html',                      
   {                                             
     "field1": f1,                               
     "field1": f2,     
   }                                             
 )                                               

 import pypandoc                                                                                            
 pypandoc.convert(source=response.content, format='html', to='docx', outputfile='output.docx')  

The template.html contains a table. In the docx file I get an table with its content separated below. Are there extra parameters to consider to solve this? Or maybe pandoc conversion doesn't support well tables yet? Are there any functional example? Maybe there is an easier way to do it?


EDIT 1

I provide more concise example. Here is a testing python snippet:

$ cat test-table.py 
#!/usr/bin/env python
test_table = """
 <p>Table with colgroup and col</p>
 <table border="1">
   <colgroup>
     <col style="background-color: #0f0">
     <col span="2">
   </colgroup>
   <tr>
     <th>Lime</th>
     <th>Lemon</th>
     <th>Orange</th>
   </tr>
   <tr>
     <td>Green</td>
     <td>Yellow</td>
     <td>Orange</td>
   </tr>
   <tr>
     <td>Fruit</td>
     <td>Fruit</td>
     <td>Fruit</td>
   </tr>
 </table>

   """
print("[test_table]")
print(test_table)
import pypandoc
pypandoc.convert(source=test_table, format='html', to='docx', outputfile='test-table.docx')  

## Write to html
with open('test-table.html', 'w') as fh:
  fh.write(test_table)

I open the html file:

$ firefox test-table.html 

and get the following html page:

enter image description here

which is good. I also get the following docx document:

$ libreoffice test-table.docx 

enter image description here

Which is not good.

I exported the docx file into a pdf file and got the following output:

$ evince test-table.pdf 

enter image description here

Note that what we see in the images are the whole page, there is no scrolling possible. Date from the second column doesn't exist at all. Any ideas?


EDIT 2

Pandoc has been installed in a conda environment:

$ type pandoc
pandoc is hashed (/home/kaligne/local/miniconda3/bin/pandoc)

Pandoc version is:

$ pandoc -v
pandoc 2.2.1
Compiled with pandoc-types 1.17.4.2, texmath 0.11, skylighting 0.7.0.2
Default user data directory: /home/kaligne/.pandoc
Copyright (C) 2006-2018 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

EDIT 3 I converted the docx file into txt:

$ docx2txt test-table.docx
$ cat test-table.txt 
Table with colgroup and col
Lime
Lemon
Green
Yellow
Fruit
Fruit

We can see that all the data are present. So I guess this has to with how information is being displayed.

kaligne
  • 3,098
  • 9
  • 34
  • 60

0 Answers0