2

I have some HTML tables that when rendered with a browser look normally wrapped. But when I try convert to pdf using pdftex, the tables are cutoff at the margins and are not wrapped. How do I make pandoc wrap the HTML tables?

Not the same as the markdown problem. The tables are pure html.

CMCDragonkai
  • 6,222
  • 12
  • 56
  • 98

1 Answers1

4

The issue is that LaTeX requires explicit specification of column widths if you want the cells to wrap, so you need to manually specify those somehow (in markdown you would do this using multiline or grid tables).

Pandoc's HTML Reader supports relative width attributes on col elements.

pandoc -f html -t latex << EOF
> <table>
>   <colgroup>
>     <col width="10%">
>     <col width="90%">
>   </colgroup>
>   <tr>
>     <td>3476896</td>
>     <td>My first HTML</td>
>   </tr>
> </table>
> 
> EOF

\begin{longtable}[c]{@{}ll@{}}
\toprule
\begin{minipage}[t]{0.09\columnwidth}\raggedright\strut
3476896
\strut\end{minipage} &
\begin{minipage}[t]{0.85\columnwidth}\raggedright\strut
My first HTML
\strut\end{minipage}\tabularnewline
\bottomrule
\end{longtable}

Notice the \columnwidth in the LaTeX output.

If you have no control over the HTML, you can write a Pandoc filter that modifies the document's AST and sets some arbitrary column widths that add up to 100%. Maybe you should also revive this old thread on pandoc-discuss where jgm aka fiddlosopher wrote:

The main reason is that with more complex tables, we need information about relative column widths, which the HTML document lacks. But I think I'm becoming convinced that we should just guess at these.

Or file a feature request to request this.

Community
  • 1
  • 1
mb21
  • 34,845
  • 8
  • 116
  • 142
  • I guess that's it! But how would you apply this if you don't have control over the HTML? That is, the HTML is scraped? – CMCDragonkai Oct 27 '15 at 11:55
  • @CMCDragonkai Even if you're not the one who generates it, you always have control over the HTML -- it's text that you're free to edit. You can use e.g., Python's BeautifulSoup library to edit various aspects of the HTML. – BallpointBen Feb 26 '21 at 19:38