0

I'm trying to split a huge .txt file into multiples .txt files containing just one paragraph each.

Let me provide an example. I would need a text like this:

This is the first paragraph. It makes no sense because is just an example.

This a second paragraph, as meaningless as the previous one.

Saved as two independent .txt files containing the first paragraph (the first file) and the second paragraph (the second file).

The first file would have only: "This is the first paragraph. It makes no sense because is just an example."

And the second one: "This a second paragraph, as meaningless as the previous one."

And the same for the whole text. In the huge .txt file paragraphs are divided by one or several empty lines. Ideas?

Thank you very much!

Community
  • 1
  • 1
JorgeF
  • 13
  • 5
  • `strsplit('This is the first paragraph. It makes no sense because is just an example.\n\nThis a second paragraph, as meaningless as the previous one.', split = '\\n+')` – alistaire Oct 07 '16 at 23:26
  • In the (perhaps unlikely) event that the format defines a *blank line* between paragraphs (allowing a single `\n` to continue the paragraph on a fresh line), you can modify @alistaire's comment with `strsplit(..., split = '\\n{2,}')` to split on *2 or more* newlines. – r2evans Oct 07 '16 at 23:43
  • Thank you. The problem is that I didn't mention that paragraphs are not alone on their own lines. One paragraph uses several lines. – JorgeF Oct 08 '16 at 14:56
  • Now the problem would be to regroup the independent lines into a paragraph every time that character(0) appears. – JorgeF Oct 08 '16 at 15:02

1 Answers1

0

I created a 3 paragraph example and am using your comment here to recreate what I think you're describing.

text <- "This is the first paragraph. It makes no sense because is just an example. Nothing makes sense and I'm trying to understand what I'm doing with life. This paragraph does not seem to end. 
What are we doing here. 

This a second paragraph, as meaningless as the previous one.
There's too much to do - this is meaningless though. 

Wow, that's funny."
    
    paras <- unlist(strsplit(text, "\n\n"))
    
    for (i in 1:length(paras)) {
      write.table(paras[i], file = paste0("paragraph", i, ".txt"), row.names = F)
    }

This code first assigns the value to the variable text and is followed bu the use of the strsplit function with the argument "\n\n" to split the text at each double newline character. Then, a for loop is used to go through each element and save it into a separate .txt file.

sky_megh
  • 135
  • 6