-1

I have a folder that contains *.docx files. I want to convert the script below into some sort of a loop function to read all docx files but I really dont know how to write R function and someone please guide me?

library(docxtractr)
real_world <- read_docx("C:/folder/doc1.docx")
docx_tbl_count(real_world)
tbls <- docx_extract_all_tbls(real_world)
a <- as.data.frame(tbls)

So ideally it appends new table everytime a new document is extracted.

Thanks Peddie

Mike Wise
  • 22,131
  • 8
  • 81
  • 104
PeddiePooh
  • 403
  • 8
  • 17
  • 1
    Use the same general idea as [here](http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r) – Rich Scriven Dec 20 '16 at 21:20

2 Answers2

1

I don't know whether your code as intended works. But here, I converted it to a function with the path argument so that you can batch process all docx under that path (don't use a slash at the end of the path). Default argument is the default path:

library(docxtractr)

docxextr <- function(pathh = ".") {
    files <- list.files(path = pathh)
    for (i in files) {
        filen <- sprintf("%s/%s", pathh, i)
        real_world <- read_docx(filen)
        docx_tbl_count(real_world) # didn't understand where this count goes?
        tbls <- docx_extract_all_tbls(real_world)
        a <- as.data.frame(tbls)
        return(a)
    }
}
Serhat Cevikel
  • 720
  • 3
  • 11
1

Edit: I assumed for this answer that the term "function" was not used in the sense of an R function by OP. I think OP means just an algorithm to solve the problem.

#### load packages ####
library(docxtractr)
library(plyr)

#### load data ####
# define path of dir
pathto <- "stackoverflow/41251392/example/"
# get path of every .docx-file in dir
filelist <- list.files(path = pathto, pattern = "*.docx", full.names = TRUE)
# read every file with docxtractr::read_docx()
tablelist <- lapply(filelist, read_docx)
# extract every table from every file with docxtractr::docx_extract_all_tbls()
tables <- lapply(tablelist, docx_extract_all_tbls)

#### append data to create one data.frame #### 
# combine extracted tables with plyr::ldply()
ldply(lapply(tables, function(x) {ldply(x, data.frame)}), data.frame)

The last line is a bit difficult to understand. Take a look at ?plyr::ldply.

nevrome
  • 1,471
  • 1
  • 13
  • 28
  • I did tried using lapply but I ended up with a list object. However your last night actually did the magic and i have the object that I can actually work with. Cheers. – PeddiePooh Dec 21 '16 at 07:29