2

I'm trying to split a document by paragraph in R

test.text <- c("First paragraph.  Second sentence of 1st paragraph.

           Second paragraph.")
# When we run the below, we see separation of \n\n between the 2nd and 3rd sentences
test.text

# This outputs the desired 2 blank lines in the console
writeLines("\n\n")

a <- strsplit(test.text, "\\n\\n")

It's not splitting properly.

matsuo_basho
  • 2,833
  • 8
  • 26
  • 47

1 Answers1

3

The output of strsplit is a list. Also, there are spaces after the \n\n. So, we need to take care of that as well as convert it to a vector using [[ or by unlisting

a <- strsplit(test.text, "\n+\\s+")[[1]]
a
#[1] "First paragraph.  Second sentence of 1st paragraph." "Second paragraph."        
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Why does `\n` work without the double `\\ `? Do you have a refference? (Not on regex, on R's behaviour.) – Rui Barradas Sep 28 '17 at 20:39
  • 1
    @RuiBarradas According to `?regex` `Escaping non-metacharacters with a backslash is implementation-dependent. The current implementation interprets \a as BEL, \e as ESC, \f as FF, \n as LF, \r as CR and \t as TAB. (Note that these will be interpreted by R's parser in literal character strings.)` – akrun Sep 28 '17 at 20:47