12

LaTex will keep all rows of a table on the same page if possible. However, I found that, if I render a RMarkdown document into a PDF file, a table may span two pages if it is near the end of a page. This is odd to me because I believe the RMarkdown file is actually converted to a LaTex file before generating the PDF file.

  ---
  title       : "Table"
  output      : 
    pdf_document
  ---

  # Section 1

  # Section 2

  # Section 3

  # Section 4

  # Section 5

  # Section 6

  # Section 7

  # Section 8

  # Section 9

  # Section 10

  # Section 11

  # Section 12

  # Section 13

  Column 1          |     Column 2 |
  -------------     | -------------|
  1) Cell           |     Cell     |
  2) Cell           |     Cell     |
  3) Cell           |     Cell     |
  4) Cell           |     Cell     |
  5) Cell           |     Cell     |
  6) Cell           |     Cell     |
  7) Cell           |     Cell     |
  8) Cell           |     Cell     |
  9) Cell           |     Cell     |
  10) Cell          |     Cell     |
  11) Cell          |     Cell     |
  12) Cell          |     Cell     |
  13) Cell          |     Cell     |
  14) Cell          |     Cell     |
  15) Cell          |     Cell     |
  16) Cell          |     Cell     |
  17) Cell          |     Cell     |
  18) Cell          |     Cell     |

If this is saved in temp.Rmd and then converted to a PDF file by render("temp.Rmd", output_file="temp.pdf"), the first twelve rows appear on page one and the remaining rows appear on page 2:

A table on two pages

Is it possible to ask render (or pandoc?) to add additional lines before a table if necessary such that all rows of a table will appear on the same page?

sfcheung
  • 384
  • 2
  • 11
  • Can you also add a reproducible example? – Roman Luštrik Dec 30 '14 at 08:44
  • Sorry for my mistake. I've just added a sample Rmd file to illustrate the problem. – sfcheung Dec 30 '14 at 09:08
  • This is because `pandoc` uses the `longtable` environment for tables by default, while in LaTeX probably you are probably using simple `tabular`. But you still have some options to tweak the `pandoc` generated LaTeX tables with custom stlyesheets: http://johnmacfarlane.net/pandoc/README.html#general-writer-options – daroczig Dec 30 '14 at 12:36
  • Thanks for the suggestion. Now I understand what happened. Please pardon me for my ignorance. I know a little bit about HTML and CSS, but I am new to both RMarkdown and LaTex. I checked that guide and understand more how I can tweak the format for HTML and some other output formats by using stylesheets. However, I don't quite understand how to tweak the format of the PDF output in a similar way. I suppose `pandoc` will create a LaTex file first. Should I modify the default template for PDF file? – sfcheung Dec 30 '14 at 17:20
  • 1
    @sfcheung you are right, `pandoc` first creates a `tex` file before running `pdflatex` on it, and you could also end up with a `tex` file instead of `pdf` if needed (for manual edits). But you'd better try to modify (fork) the default LaTeX template used by `pandoc`, or revert to an older version of `pandoc` which did not use `longtable` yet (pre 1.1): http://johnmacfarlane.net/pandoc/releases.html – daroczig Dec 30 '14 at 23:02

2 Answers2

8

As was suggested in the comments, the problem is that the default LaTeX template for pandoc uses longtable (normal LaTeX tables don't split over pages). If you don't feel up to creating your own template, you can just modify the default.

Vanilla Pandoc

You can use knitr to produce a normal Markdown file. Then, you can use pandoc to produce the PDF/TeX file using another LaTeX template via

pandoc --template=mytemplate.xex -o myfile.pdf myfile.md

The easiest way to set up a new template is by modifying the default one, which you can get pandoc to dump to the console for you:

pandoc --print-default-template=latex

Then you need to change the line \usepackage{longtable,booktabs} to \usepackage{booktabs}.

If you're on OS X or Linux, then you can use sed and output redirection to directly generate a template without longtable:

pandoc --print-default-template=latex | sed 's/longtable,//' > mytemplate.tex

RStudio

If you're doing this from RStudio, then the easiest option is probably to just change the default template. (Recent releases of RStudio bundle pandoc and so use things differently than system pandoc.) If you look in the "R Markdown" build/status window, you'll see something like this:

output file: rmarkdown.knit.md

/Applications/RStudio.app/Contents/MacOS/pandoc/pandoc rmarkdown.utf8.md --to latex --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash-implicit_figures --output rmarkdown.pdf --template /Library/Frameworks/R.framework/Versions/3.0/Resources/library/rmarkdown/rmd/latex/default.tex --highlight-style tango --latex-engine /usr/texbin/pdflatex --variable 'geometry:margin=1in' 

Output created: rmarkdown.pdf

(I did this example on a Mac, on Windows or Linux, this will look different.) The template is listed there in the command, which you can then modify as above. This will of course change the behavior for all documents produced via RStudio. To my knowledge, there's currently no publicly facing option to change the template used, but this may change as document templates seem to be an area of active work in recent releases.

EDIT (2016-05-05):

It seems that the use of longtable is hard coded in the recent versions of pandoc, so removing longtable from the preamble will generate some errors. You can get around this by using a filter.

Save the linked python script and

Vanilla Pandoc

add the --filter path/to/filter.py flag to your your pandoc invocation.

RStudio

modify your YAML block for the extra pandoc args:

---
title       : "Table"
pandoc_args : --filter path/to/filter.py
output      : 
    pdf_document
---

As noted in the link above, this will produce plain LaTeX tables, which means no support for footnotes in tables.

Livius
  • 3,240
  • 1
  • 17
  • 28
  • 3
    This is a nice idea, but it seems Pandoc will use a `longtable` environment to create the table, so if I don't load the `longtable` package I get this error: ```pandoc: Error producing PDF from TeX source. ! LaTeX Error: Environment longtable undefined. See the LaTeX manual or LaTeX Companion for explanation. Type H for immediate help. ... l.481 \begin{longtable} ``` – Arthur Apr 04 '15 at 13:02
  • Unfortunately this does not work on the latest version of pandoc. I get "Undefined control sequence" when generating a PDF using the filter. – Natalie Adams May 21 '16 at 03:31
5

The cleanest way would be to add a page break (\newpage or \pagebreak) before the table, although this is unintelligent if you're editing text that would move the position of the table. I guess the stage to do this would be when you're finished editing the document and after a test output (to check for ugly breaks), right before generating the final output.

This answer to a related question is already on SO. Also, apparently \pagebreak is:

actually a LaTeX command, rather than a Markdown one, but most … markdown-to-pdf engines … use LaTex and will accept it.

Community
  • 1
  • 1
Dave Everitt
  • 17,193
  • 6
  • 67
  • 97
  • 1
    Thanks. This is something I would try for a document I am working on. But as you pointed out, this requires manual editing after examining the output, and would not be good if the preceding materials will be changed frequently. It would be more efficient if the reposition of the table can be done automatically as `tabular` does, mentioned by @daroczig. – sfcheung Dec 30 '14 at 17:14