0

I have this code that works for me (it's from Jockers' Text Analysis with R for Students of Literature). However, what I need to be able to do is to automate this: I need to perform the "ProcessingSection" for up to thirty individual text files. How can I do this? Can I have a table or data frame that contains thirty occurrences of "text.v" for each scan("*.txt")?

Any help is much appreciated!

# Chapter 5 Start up code

setwd("D:/work/cpd/R/Projects/5/")

text.v <- scan("pupil-14.txt", what="character", sep="\n")
length(text.v)


#ProcessingSection
text.lower.v <- tolower(text.v)
mars.words.l <- strsplit(text.lower.v, "\\W")
mars.word.v <- unlist(mars.words.l)

#remove blanks
not.blanks.v <- which(mars.word.v!="")
not.blanks.v

#create a new vector to store the individual words
mars.word.v <- mars.word.v[not.blanks.v]
mars.word.v
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38

1 Answers1

0

It's hard to help as your example is not reproducible.

Admitting you're happy with the result of mars.word.v, you can turn this portion of code into a function that will accept a single argument, the result of scan.

processing_section <- function(x){
  unlist(strsplit(tolower(x), "\\W"))
}

Then, if all .txt files are in the current working directory, you should be able to list them, and apply this function with:

lf <- list.files(pattern=".txt")
lapply(lf, function(path) processing_section(scan(path, what="character", sep="\n")))

Is this what you want?

Community
  • 1
  • 1
Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38
  • Thanks, Vincent. Would it be easier to have a function return a table or dataframe (sorry - newbie) containing a list of text.v types? (i.e. a list of vectors returned from the scanning of multiple .txt files and then I access each 'text.v' using a subscript on that list or dataframe of vectors? – Hugh O'Donnell Apr 07 '16 at 20:24
  • Depends of what you want to do but not sure to understand your question. – Vincent Bonhomme Apr 07 '16 at 20:51
  • I really appreciate this, Vincent. Can I read in a number of text files - students stories - and aftrr each one is read in by the scan() each text file is stored in a vector and a list/table of vectors containing the contents of the text files is returned as a table or list of vectors – Hugh O'Donnell Apr 07 '16 at 20:54
  • Amazing - you make it look so simple. Thank you. I am hoping to begin Doctorate in Education - I'm interested in English Language use - spoken and written - when pupils (13-14 year old) use a multi-user simulation program to learn and write about Science. I thank you on their behalf! – Hugh O'Donnell Apr 08 '16 at 20:41