as I am new to R and semi new to programming in general, I hope someone can help me. I have a matrix in which there is text body in column 2. For further analysis I want to split that text into multiple parts of equal length, i.e. the same number of words (2 for the beginning). I also want to process those new parts further, so I would like them to be integrated into the existing matrix yet new columns.
Now I have found the split function and was wondering if I can solve my problem with it? Split function
Also, can I implement a dynamic word counter ("count every word in a message until value is greater than...")?
Any tips on how to progress would be greatly appreciated. Thank you already in advance.
EDIT 2: My code so far looks like this:
library(tm)
library(NLP)
TestMatrix2 = matrix(c("1", "2","The masked shrike (Lanius nubicus) is a bird in the shrike family, Laniidae. It breeds in southeastern","The throat, neck sides and underparts are white, with orange flanks and breast","17","13"),2,3)
colnames(TestMatrix2) = c("index","news body", "word count")
Test2 <- data.frame(strsplit(TestMatrix2[[1,1]], " "),stringsAsFactors=FALSE)
NewsPartitioning <- function(NumberOfParts = 2, NewsIndicator= 1){
MaxWords = TestMatrix2[NewsIndicator,3]
CritValue = TestMatrix2[NewsIndicator,3]/NumberOfParts
as.integer(CritValue)
new = list()
colnames(Test2) = c("Words")
for (i in 1:CritValue){new = c(new,Test2$Words[i])}
new = unlist(new)
TestMatrix[NewsIndicator,3+NumberOfParts] = paste(new, collapse = " ")
for (i in CritValue+1:MaxWords){new = c(new,Test2$Words[i])}
new = unlist(new)
TestMatrix[Nachricht,3+NumberOfParts] = paste(new, collapse = " ")
}
At the moment, I am being given the error message "new columns would leave holes after existing columns".
I guess the procedure is neither efficient nor very elegant. Any thoughts or help?
Best regards Basti