I have a large set of .doc files which give the variables available in a set of corresponding datasets. I would like to scan through these in R and see which datasets contain a variable of interest. I have done this before on plain text files using readLines but this does not work on .doc files.
I have downloaded the tm package which should be able to read .doc files using the readDOC command, but the instructions are quite limited and I can't get it to work. Does anyone know how to use the readDOC command or have another suggestion for how to do this in R? Thanks!
Thank you very much everyone for the replies and suggestions. I thought R might be set up to read in .doc files quite easily, but from what you say I think the easiest thing is to convert all the word files to another format first. I've just downloaded some free software called 'Convert Doc' where I store all the word documents in one folder and it put them all to .txt files very quickly. Now I can automate the searching as I have around 100 datafiles with accompanying word documents that specify the variable coding, which is not always the same in each datafile (eg for yes/no, some use 0/1, others use 1/2) so this allows me to find the right variable and store its coding using readLines, grep and a bit more text processing. Thanks!