0

The structure of file is not important for me so from some previous solution as mentioned "converting them to plain text and importing them with readLines" ,i changed file type from ".doc/.docx" to ".txt" and end up with an error

file_list = list.files("D:/R/New",pattern="*.txt",full.names=F
obj_list <- lapply(file_list,readLines)
Warning messages:
1: In FUN(c("adityar.txt":
  incomplete final line found on 'adityar.txt'

I have tried to read with the help of corpus as well but didnt find good result ,here the second solution says about pdf and unix ,any better and fast approach, i am working on windows platform,any help.

Community
  • 1
  • 1
Aashu
  • 1,247
  • 1
  • 26
  • 41

1 Answers1

0

Using python , you can do this :

from docx import *
import json
document = opendocx("path_to_your_docx")
res = getdocumenttext(document)

You can save your script and call it from R using system

agstudy
  • 119,832
  • 17
  • 199
  • 261