0

I have very many (~1 million) txt files, that each contain written text (on average 1000 words per txt written as in a book, a screenshot from a txt with fewer words is attached here:

image of txt).

I wish to create a table in R with two columns and 1 million rows: column 1 is the name of the txt file, column 2 is all text in the txt file with one row per txt file.

I have two challenges with this (to which I have not found answers on stackoverflow yet):

  1. My txt file is written text, however it is formatted as very many rows and columns (in an unstructured manner, each txt-file being different from one another). Thus I need a function that takes ALL text that is in the txt and treats it as one "cell". And
  2. I have not found a function yet that allows me to upload so many txt files and integrate them into on large table (without R stopping to work). Is there a smart solution to that?
camille
  • 16,432
  • 18
  • 38
  • 60
Laura Ne
  • 1
  • 1
  • 3
    Welcome to SO! Your question is too broad, it ask a general advice, it has neither data nor code. It's very difficult to answer, and it could be closed. I've understood what you need, but without any kind of data, it's pretty hard to answer (and reading your question, any kind of answer could be not sufficient). Try to be more specific, and add maybe some example of data using `dput()`. – s__ Aug 20 '19 at 14:13
  • 2
    It sounds like you want to build a corpus. I would check out the `tm` package specifically the `VCorpus` and `DirSource` functions. Once you have a corpus you can create the dataframe a bit more easily. – emilliman5 Aug 20 '19 at 14:25
  • Thanks s_t for your feedback. I have attached a screenshot from one of the txt files in my question now (as I have not found a way to attach the txt file itself), but I am not sure if that is of any help? – Laura Ne Aug 20 '19 at 14:27
  • I agree with the corpus suggestion. – qwr Aug 20 '19 at 14:28
  • 1
    [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and a clear explanation of what you're trying to do and what hasn't worked. – camille Aug 20 '19 at 14:35

1 Answers1

1

How to One way to implement this in code using base R:

df <- data.frame(
  # text_files is a character vector of filepaths
  file = text_files,
  text = vapply(text_files, function(x) paste(readLines(x), collapse = "\n"), character(1)), 
  row.names = NULL,
  stringsAsFactors = FALSE
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69