0

I'm new to R programming and wondering how I can take the contents of 1,172 text files and create a data frame with the contents of each text file in individual rows in the data frame.

So I want to go from having 1,172 text documents to having a data frame with 1,172 rows and 1 column, with each row having the contents of each individual text file. So the fifth row of the data frame would include the text from the fifth text document in the list I feed into R.

Thanks,

Tyler

Tyler
  • 3
  • 1
  • 5
  • Have you tried anything yet? Where exactly did you get stuck? This sin't exactly a common operation so there's one single built in function for this, but it shouldn't be too hard to put something together. Use `list.file()` to find all the files you want to read and then map those values to `readLines()` or something else to actually read the files (See https://stackoverflow.com/questions/9068397/import-text-file-as-single-character-string). – MrFlick May 18 '18 at 20:08
  • What I did is made a corpus file with the tm package, then I ran this: vec <- sapply(corpus_object_name,print) – Tyler May 18 '18 at 20:48

2 Answers2

3
# get all  files with extension "txt" in the current directory
file.list <- list.files(path = ".", pattern="*.txt", full.names=TRUE)

# this creates a vector where each element contains one file
all.files <- sapply(file.list, FUN = function(x)readChar(x, file.info(x)$size))

# create a dataframe
df <- data.frame( files= all.files, stringsAsFactors=FALSE)

The last 2 steps could be united into one to avoid creating an extra vector:

df <- data.frame( files= sapply(file.list, 
                                FUN = function(x)readChar(x, file.info(x)$size)),
                  stringsAsFactors=FALSE)
Katia
  • 3,784
  • 1
  • 14
  • 27
  • Thank you! I would upvote this but I don't have enough of a reputation. – Tyler May 18 '18 at 20:45
  • @Tyler If this answers your question, you can mark it "accepted". You can find the "check" mark below the vote arrows. – Katia May 18 '18 at 23:38
1

I just tested this and it worked fine for me.

# set the working directory (where files are saved)
setwd("C:/your_path_here/")

file_names = list.files(getwd())
file_names = file_names[grepl(".TXT",file_names)]

# print file_names vector
file_names
files = lapply(file_names, read.csv, header=F, stringsAsFactors = F)
files = do.call(rbind,files)
ASH
  • 20,759
  • 19
  • 87
  • 200