0

I chunked several novels into a data frame called documents. I want to export each chunk as a separate .txt file.

The data frame that consists of two columns. The first column has the file names for each chunk, and the second column has the actual text that would go into the file.

documents[1,1]
[1] "Beloved.txt_1"

documents[1,2]
[1] "124 was spiteful full of a baby's venom the women......"

class(documents)
[1] "data.frame"

I'm trying to write a for loop that would take each row, make the second column into a .txt file, and make the first column the name of the file. And then to iterate for each row. I've been working with something like this:

for (i in 1:ncol(documents)) {
  write(tagged_text, paste("data/taggedCorpus/",
                     documents[i], ".txt", sep=""))

I've also been reading that maybe the cat function would work well here?

Stefano
  • 25
  • 7
  • 1
    Please edit with the results of `dput(documents)`. – alistaire Jan 23 '16 at 23:23
  • @alistaire there's too much to copy here! What are you interested in knowing exactly? – Stefano Jan 23 '16 at 23:30
  • It doesn't have to be your data, but to get an answer, you really need to post a facsimile so people know how your data is arranged, types, classes, etc. It's best to create it with a `dput` of a real R object so it's easy for others to load it without retyping everything. – alistaire Jan 23 '16 at 23:32
  • See the [canonical post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – alistaire Jan 23 '16 at 23:33
  • @alistaire thank you! I added some context – Stefano Jan 23 '16 at 23:42

1 Answers1

0

I'm not positive this will work for you (a little more of an example of your input and desired output would help), but one issue you've got is that your for loop is by column rather than by row. If you want to do this once for every row, then it needs to be for (i in 1:nrow(documents) rather than ncol.

Assuming that "documents" is the name of your data.frame and that the column containing the text you want to save is called "tagged_text" and the column with the file name is called "file", try this:

 for (i in 1:nrow(documents)) {
      write(documents$tagged_text[i], paste0("data/taggedCorpus/",
                 documents$file[i], ".txt"))
 }

Note that you don't need to specify the path every time if you already set it before you start the loop.

shirewoman2
  • 1,842
  • 4
  • 19
  • 31
  • ok this makes sense! How do you set the path before the loop? Do you set it to a variable and then include that variable in the paste function? – Stefano Jan 23 '16 at 23:47
  • On the line before the `for` loop begins, you could just say `setwd("data/taggedCorpus")`. After the loop, to return to the working directory you were on before, you'd add `setwd("..")`. Then, you'd just use this for the `write` command: `write(documents$tagged_text[i], paste0(documents$file[i],".txt"))` – shirewoman2 Jan 23 '16 at 23:52