6

I want to ask if it is possible to convert text files such as word document or text document to PDF using R ? I thought of converting it to .rmd and then to PDF using this code

require(rmarkdown)
my_text <- readLines("C:/.../track.txt")
cat(my_text, sep="  \n", file = "my_text.Rmd")
render("my_text.Rmd", pdf_document())

But it doesn't work showing this error:

> Error: Failed to compile my_text.tex.
In addition: Warning message:
running command '"pdflatex" -halt-on-error -interaction=batchmode "my_text.tex"' had status 127 

Is there any other solution ?

zx8754
  • 52,746
  • 12
  • 114
  • 209
Mouna Jmii
  • 83
  • 1
  • 1
  • 5
  • What OS are you working on? – Carl Boneri Mar 05 '18 at 15:26
  • I am using windows 7 – Mouna Jmii Mar 05 '18 at 15:30
  • You might need to install [`MikTeX`](https://miktex.org/download) and [`pandoc`](https://github.com/jgm/pandoc/releases/tag/2.1.2) – Tung Mar 05 '18 at 15:39
  • 1
    "text files such as word document or text document" - different types of file will need a different procedure. You may like to narrow the scope of your question – dww Mar 05 '18 at 16:43
  • Ok here's the answer if you still work on with your projects https://stackoverflow.com/a/46658645/15027157 The idea of the post is to convert those docx into html than to pdf 'cause we don't have much choice and that's is the only answer I've been working for days Have a nice day! – Ralph Aug 02 '21 at 07:20
  • You can check the following answers : https://stackoverflow.com/questions/49113503/how-to-convert-docx-to-pdf – Emmanuel Hamel Apr 19 '23 at 01:12

3 Answers3

6

.txt to .pdf

Install wkhtmltopdf and then from R run the following. Change the first three lines as appropriate depending on where wkhtmltopdf is on your system and depending on the input and output file paths and names.

wkhtmltopdf <- "C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe"
input <- "in.txt"
output <- "out.pdf"
cmd <- sprintf('"%s" "%s" -o "%s"', wkhtmltopdf, input, output)
shell(cmd)

.docx to .pdf

Install pandoc, modify the first three lines below as needed and run. How well this works may vary depending on your input.

pandoc <- "C:\\Program Files (x86)\\Pandoc\\pandoc.exe"
input <- "in.docx"
output <- "out.pdf"
cmd <- sprintf('"%s" "%s" -o "%s"', pandoc, input, output)
shell(cmd)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • still having an error, I think that it is related to my computer: Warning messages: 1: running command 'C:\Windows\system32\cmd.exe /c "C://Program Files (x86)/Pandoc/pandoc.exe" "C:/Users/../TMP-GAF01 - Curriculum Vitae_MJ.doc" -o "C:/Users/../CV_J.pdf"' had status 1 2: In shell(cmd) : l'exécution de '"C://Program Files (x86)/Pandoc/pandoc.exe" "C:/Users/.../TMP-- Curriculum.doc" -o "C:/Users/.../CV_J.pdf"' a échoué avec le code d'erreur 1. – Mouna Jmii Mar 06 '18 at 12:33
  • 1
    The question is about `.docx` files. That is *not* the same as `.doc`. – G. Grothendieck Mar 07 '18 at 02:57
5

I absolutely have not been able to make the Pandoc method work for me.

I did figure out a way to convert docx to PDF using RDCOMClient, however.

library(RDCOMClient)

file <- "C:/path/to your/doc.docx"

wordApp <- COMCreate("Word.Application")  # create COM object
wordApp[["Visible"]] <- TRUE #opens a Word application instance visibly
wordApp[["Documents"]]$Add() #adds new blank docx in your application
wordApp[["Documents"]]$Open(Filename=file) #opens your docx in wordApp

#THIS IS THE MAGIC    
wordApp[["ActiveDocument"]]$SaveAs("C:/path/to your/new.pdf", 
FileFormat=17) #FileFormat=17 saves as .PDF

wordApp$Quit() #quit wordApp

I found the FileFormat=17 bit here https://learn.microsoft.com/en-us/office/vba/api/word.wdexportformat

Hopefully this helps!

  • 2
    This code worked nicely, I just added `wordApp[["ActiveDocument"]]$Close(SaveChanges = 0)` before the Quit line to save the document with no changes. – user3357059 Mar 06 '19 at 20:05
  • Thanks, Note that on my windows 10 machine *this RDCOM method need that the output doesn't exist*. So, add an objet after the `file` argument, in order to indicate the output location (`destination ="C/path_to_my_docx/texte.docx"'`). Then add `file.remove(destination)` before the `wordApp[[ActiveDocument"]]$SaveAs(destination, FileFormat = 17)`.     + This method need no space at all in the folders or files names (i.e in the code above: `file` object and in the `SaveAs('C:/path')`) – Clément LVD Feb 01 '21 at 13:52
3

.docx to .pdf with libreoffice

As suggested here by JeanVuda, you can also convert .docx to .pdf with libreoffice, assuming you've made an install of libreoffice on your machine.

The following code convert a .docx file to .pdf using libreoffice :

docfile <- "X:/path_to_your_docx/yourdocxfile.docx" 
# Indicate the correct path for the .docx file you want to convert

system(paste("X:/path_to_libreoffice/program/soffice.exe --headless --convert-to pdf", docfile), intern = TRUE)
# Indicate the correct path where libreoffice executable is located on your machine,
# convert .docx to .pdf with libreoffice.

Feedback on libreoffice :

  1. Where my pandoc version fail to convert .docx to a .pdf and RDCOMClient is not available for my version of R, libreoffice provide a fast and direct way to convert word document in multiple format.

  2. Please note that for the .pdf conversion, the tables don't render correctly in the .pdf (but are printed in landscape mode), and the most direct way I can find is to transform my tables in images during the knitting of the word document with kableExtra::as_image(), which is maybe not appropriate for what you need.

  3. There are previous questions about command line converting to others format here, and I guess the original answer in ReporteR discussion which introducing this method for the useRs is that one.

Best regards

Clément LVD
  • 648
  • 5
  • 12
  • Is it possible to use this to convert many files within folders? – ZR8 Mar 21 '22 at 14:32
  • yes, you need to configure this with a `for` command, in order to iterate through a list of correct path. For example: 1) find some .docx files in your working directory and generate a list of their path = `docfile = paste0(getwd(), list.files(getwd(), "*.docx"))` ; 2) iterate with `for`, in order to generate several pdf - you'll need to indicate the correct path for libreoffice, since it's hardcoded hereafter: `for(i in 1:length(docfile)){ system(paste("X:/path_to_libreoffice/program/soffice.exe --headless --convert-to pdf", docfile[i]), intern = TRUE)}` I don't test this code yet – Clément LVD Mar 28 '22 at 07:58