0

I have 257 .txt files, each has a bunch of Q+A transcripts. I want to extract the text from each one into a single vector in R. Most of the related questions involve reading in multiple files into a dataframe or table, I don't want either of those, just a huge chunk of text.

I did successfully get all the files in:

QA_all <- choose.files()

But beyond that I'm stumped. A solution mentioned here: Import multiple text files in R and assign them names from a predetermined list

seemed to approximate what I want, but it's a list of some kind. I was able to extract the text items from the list into a vector and then flatten it and remove lines:

#extract text from files and put in a vector
data_list = lapply(QA_all, function(file) scan(file, what = "character"))

text <- c(data_list[1:257])
flat.list <- unlist(text, recursive = TRUE, use.names = TRUE)

#remove lines
QA.vector <- paste(flat.list, collapse=" ")

but I wonder If I can this directly without having to create a list with lapply(). I want to know if there's a more direct way to extract text from several files and put them into onto contiguous unit of text in R.

Wangana
  • 71
  • 1
  • 9
  • 2
    `c(lapply(QA_all, readLines))` gets you all files concatenated into a single vector, do you need to use `scan` for a particular reason? – r2evans Mar 08 '19 at 23:03
  • Thank you for your response it was helpful. To be honest I'm not 100% sure what scan() is for, I thought it was necessary to get text as character data. When I use the line you gave me, it still creates a list and I still have to use the unlist() function. Also when I use it I get a warning message for 50 of the 247 files that says: In FUN(X[[i]], ...) : incomplete final line found on (file name). I'm not sure what it means, when I inspect the lines I can't seem to find any issues. – Wangana Mar 11 '19 at 02:12
  • 1
    `unlist`, sure. The warning is because every line in the file ends with LF or CRLF (depending on its origin) except the last line that ends on something else. It may not be the majority but it happens often enough for me. That is absolutely not a problem. You can silence those warning (crudely) with `suppressWarnings(...)`. – r2evans Mar 11 '19 at 13:55

0 Answers0