From my previous problem, I have some texts
in different rows, and from the text I am trying to generating word-table
for each of the words. But problem is occurring when the row number of the text column, and row number of the word-table
unlike. It has been found for some text, two or more rows are being created. So finally I cannot cbind
these two together. Code is here. I just want the outcome will be exactly same row number of the text that I can bind them together to show for which text is which word-table
.
texts <- c("concratulations successfully signed company please find attached quick guide can t figure immediately ", " conversation laughing services sweden", "p please find attached drafted budget p ", "p please finad attached agenda today s board meeting p ", "p hi nbsp p p please find attached darft meeting minutes today s meeting p ", "p please find attached final version minutes updated action log please let know actions done ll update excel nbsp p ", "p hi p p please find attached draft meeting minutes action log please provide comments end next week p p nice spring party saturday p p tuija p ", " p welcome team priority hope enjoy yo p ", "p please find attached flyer can study share p ", "p attached new version voice receiver p p minor change request invitation code mentioned invitation code may tell check code invitation email end alarm bell example telling new comments ", "comment etc front page now seemed end without warning p ", "p memo attached actions p ", "p please find attached updated board roles responsibilities made changes red document please review especially role relevant contact info prepare comment meeting wednesday nbsp p ", "p attached documents review please comment soonest p ")
texts <- cbind(texts)
## to remove multi-white spaces
MyDf <- gsub("\\s+"," ",texts)
MyDf <- gsub("\r?\n|\r", " ", MyDf)
MyDf <- cbind(MyDf)
colnames(MyDf) <- c("Introduction")
## this way, extra rows are being generated
word_table <- read.table(text = paste(gsub('\n', ' ', MyDf), collapse = '\n'), fill = TRUE)
## this way, the words are being repeated to match with the largest text
word_table <- do.call(rbind, strsplit(as.character(MyDf), " "))
More details: the texts had multiple whitespaces, or tab. Initial assumption was, may be that additional spaces creating the problem, but after removing the additional white spaces, still it is in the same problem.
Please Help