My data frame contains several collected articles, df$title represents the title and df$text represents the content of each article. I need to break down each article into several paragraphs. Here is how I breakdown just ONE article:
pattern = "\\bM(?:rs?|s)\\.\\s"
aa <- str_replace_all( text1, pattern, "XXXX")
bb <- unlist(strsplit(aa, "XXXX"))
cc <- bb[-1]
dd <- gsub("[\\]", " ", cc)
paragraph vector <- gsub("[^[:alnum:]]", " ", dd)
How can I label each paragraph with the title of the article and apply the break down work to the whole column (df$text)? And I want each paragraph become one observation (instead of one article as a observation).