6

I am using officer (used to use reporters) within a loop to create 150 unique documents. I need these documents however to be exported from R as word docx AND pdfs.

Is there a way to export the document created with officer to a pdf?

Amanda
  • 61
  • 1
  • 3

3 Answers3

9

That's possible but the solution I have depends on libreoffice. Here is the code I am using. Hope it will help. I've hard-coded libreoffice path then you probably will have to adapt or improve the code for variable cmd_.

The code is transforming a PPTX or DOCX file to PDF.

library(pdftools)
office_shot <- function( file, wd = getwd() ){
  cmd_ <- sprintf(
    "/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to pdf --outdir %s %s",
    wd, file )
  system(cmd_)

  pdf_file <- gsub("\\.(docx|pptx)$", ".pdf", basename(file))
  pdf_file
}
office_shot(file = "your_presentation.pptx")
David Gohel
  • 9,180
  • 2
  • 16
  • 34
  • Thanks! I'm not familiar with using system() and am having trouble figuring out how to get it to make the pdf and export that without then going on to making a png. Every time I run office_shot() I am getting the error message: Error in normalizePath(pdf, mustWork = TRUE) : path[1]="/Users/amanda/Documents/reports/ACAD_2014.pdf": No such file or directory But of course not because they are .docx files in the folder that I need to convert to pdfs – Amanda Sep 18 '18 at 15:43
  • Hi David, do you have an idea how to use this on windows? – Arcoutte Jan 27 '20 at 13:30
  • 1
    I don't work on windows but I am pretty sure that by changing the path of soffice to a valid path it should work. – David Gohel Jan 27 '20 at 13:45
  • 2
    Quick response :-), thanks, I've found it, had to add .exe and quote " " the path – Arcoutte Jan 27 '20 at 14:03
  • 1
    Hi Arcoutte, I tested under Windows by replacing to `sprintf(" 'D:/Program Files/LibreOffice/program/soffice.exe' --headless --convert-to pdf --outdir %s %s", wd, file) system(cmd_)`, but it's not working out. Did I make mistakes in the path? – ah bon Dec 23 '21 at 11:04
3

I've been using RDCOMClient to convert my OfficeR created docx's to PDFs.

library(RDCOMClient)

file <- "C:/path/to your/doc.docx"
wordApp <- COMCreate("Word.Application") #creates COM object
wordApp[["Documents"]]$Open(Filename=file) #opens your docx in wordApp
wordApp[["ActiveDocument"]]$SaveAs("C:/path/to your/doc.pdf"), FileFormat=17) #saves as PDF 
wordApp$Quit() #quits the COM Word application

I found the FileFormat=17 bit here https://learn.microsoft.com/en-us/office/vba/api/word.wdexportformat

I've been able to put the above in a loop to convert multiple docx's to PDFs quickly, too.

Hope this helps!

  • Not available for R 3.6 for those using it. – Martin Dec 10 '20 at 13:22
  • True, and since I wrote this I have abandoned RDCOMClient. I use {reticulate} to access Python's {win32com.client} and from there do basically the exact same thing. It's pretty cumbersome if you don't already have R/{reticulate}/Python already, but it works in just the same way. – Zachary Smithingell Dec 11 '20 at 00:17
3

There is a way to convert your docx into the pdf. There is a function convert_to_pdf from the docxtractr package.

Note that this function is using LibreOffice to convert docx to pdf. So you have to install LibreOffice before and write the path to the soffice.exe. Read more about paths for different OS here.

Here is a simple example how to convert several docx documents into pdf on the Windows machine. I have Windows 10 and LibreOffice 6.4 installed. Just imagine that you have X Word documents stored in the data folder and you want to create the same amount of PDF in the data/pdf folder (you have to create the pdf folder before).

library(dplyr)
library(purrr)
library(docxtractr)

# You have to show the way to the LibreOffice before
set_libreoffice_path("C:/Program Files/LibreOffice/program/soffice.exe")

# 1) List of word documents
words <- list.files("data/",
                    pattern = "?.docx",
                    full.names = T)

# 2) Custom function
word2pdf <- function(path){
  
  # Let's extract the name of the file
  name <- str_remove(path, "data/") %>% 
    str_remove(".docx")
  
  convert_to_pdf(path,
                 pdf_file = paste0("data/pdf/",
                                   name,
                                   ".pdf"))
  
}

# 3) Convert
words %>%
  map(~word2pdf(.x))
atsyplenkov
  • 1,158
  • 13
  • 25