5

I am trying to develop a rmarkdown report for my data analysis that could be knitted both in word_document and pdf_document. Bookdown works really well for captions and automatic numbering (https://bookdown.org/yihui/bookdown/). The only main issue left is how to do page breaks that could work for both.

For pdf, i use xelatex from tinytex and \newpage works great. For Word, I use section 5 page break and customize the style (incl. page break and white font).

I could use Edit > Find... and Replace All, but as I am still developing the report and need to test frequently that the output looks great in both formats.

Is there any way I could either:

  • do the replace all in a R function,
  • edit the tex template to have section 5 not display in pdf outputs (\newpage in not shown in ms word), or
  • apply a magic command to force a page break compatible with all formats?

Thanks!

Here is a reproducing example of R Markdown file:

---
title: "Untitled"
author: "Me"
date: "November 15, 2018"
output:
  pdf_document: default
  word_document: default
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
Some text.  

I want a page break after this.

\newpage
##### page break

This should be the first sentence of the new page.

Some more text.
David
  • 100
  • 1
  • 8
  • Relevant answer to a similar question: https://stackoverflow.com/a/52131435/2425163. The mentioned Lua filter can be invoked by writing `pandoc_args = ['--lua-filter=` below both `pdf_document` and `word_document`. – tarleb Nov 15 '18 at 13:13
  • Thanks! It works well for docx/pdf (I only tested these two formats), including pdf_document2 and word_document2 from bookdown. Before i make it an accepted answer, it noticed it creates an empty line before and after the page break. Any chance i could modify the lua filter to remove the empty lines, at least the one after the page break? – David Nov 15 '18 at 16:32
  • 1
    Also, I didn't include the filter for pdf output since \newpage natively works there. – David Nov 15 '18 at 16:34
  • The empty lines are produced by the extra paragraph which is inserted to create the line break. It should be harmless, but I can think of a way to get rid of it. I'm going to publish the re-worked code in a more central location and can ping you once it's available. – tarleb Nov 15 '18 at 16:56
  • Agreed, i was afraid that the empty line plus the space before a header 1 for example would generate too much empty space (for the word output) but it's actually a very minor concern. Thanks a lot for the answer! – David Nov 15 '18 at 18:06

2 Answers2

3

Many thanks to tarleb for the answer. As suggested I used your answer to this post: https://stackoverflow.com/a/52131435/2425163.

step 1: create a txt file with the following code:

--- Return a block element causing a page break in the given format.
local function newpage(format)
  if format == 'docx' then
    local pagebreak = '<w:p><w:r><w:br w:type="page"/></w:r></w:p>'
    return pandoc.RawBlock('openxml', pagebreak)
  elseif format:match 'html.*' then
    return pandoc.RawBlock('html', '<div style=""></div>')
  elseif format:match '(la)?tex' then
    return pandoc.RawBlock('tex', '\\newpage{}')
  elseif format:match 'epub' then
    local pagebreak = '<p style="page-break-after: always;"> </p>'
    return pandoc.RawBlock('html', pagebreak)
  else
    -- fall back to insert a form feed character
    return pandoc.Para{pandoc.Str '\f'}
  end
end

-- Filter function called on each RawBlock element.
function RawBlock (el)
  -- check that the block is TeX or LaTeX and contains only \newpage or
  -- \newpage{} if el.format:match '(la)?tex' and content:match
  -- '\\newpage(%{%})?' then
  if el.text:match '\\newpage' then
    -- use format-specific pagebreak marker. FORMAT is set by pandoc to
    -- the targeted output format.
    return newpage(FORMAT)
  end
  -- otherwise, leave the block unchanged
  return nil
end

step 2: save the file as page-break.lua in the same directory with my R Markdown file.

step 3: add the link as pandoc argument.

This the reproducible example (R Markdown file) corrected:

---
title: "Untitled"
author: "Me"
date: "November 15, 2018"
output:
  pdf_document: default
  word_document:
    pandoc_args:
     '--lua-filter=page-break.lua'
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

Some text.  

I want a page break after this.

\newpage

This should be the first sentence of the new page.

Some more text.

Please note that this may not work for the toc, but i don't use the lua filter with pdf and with word _document it's very easy to add the table of content afterwards directly in Word. Plus there is a link to a solution for that problem in the above link.

David
  • 100
  • 1
  • 8
  • As promised: we've published an updated and improved filter here: https://github.com/pandoc/lua-filters/tree/master/pagebreak – tarleb Nov 17 '18 at 16:46
0

For PDF knit you can add to YAML header in R Markdown as follows:

   ---
   title: "Report"
   author: "Author"
   date: "Date"
   output:
     html_document:
       toc: yes
       toc_float: yes
       toc_collapsed: no
       smooth_scroll: yes
       toc_depth: 2
       number_sections: yes
       theme: cerulean
       fig_caption: yes
       df_print: paged
     pdf_document:
       toc: yes
       toc_depth: '2'
       number_sections: yes
       latex_engine: xelatex
       pandoc_args: [
         #pandoc_args for page break before H1 heading
         "-V", "header- 
  includes:\\usepackage{titlesec}\\newcommand{\\sectionbreak}{\\clearpage}"]
   header-includes:
     - \usepackage{fontspec}
     - \setmainfont{Calibri}
   fontsize: 12pt
   editor_options:
     markdown:
       wrap: 72
   bibliography: references.bib
   link-citations: true
   ---