2

I have some sentences, from the sentences I want to separate the words to get row vector each. But the words are repeating to match with the largest sentence's row vector that I do not want. I want no matter how large the sentence is, the row vector of each of the sentences will only be the words one time.

sentence <- c("case sweden", "meeting minutes ht board meeting st march now also attachment added agenda today s board meeting", "draft meeting minutes board meeting final meeting minutes ht board meeting rd april")
sentence <- cbind(sentence)
word_table <- do.call(rbind, strsplit(as.character(sentence), " "))
test <- cbind(sentence, word_table)

This is what I get now, enter image description here

And this is what I want, enter image description here

I mean no-repeating.

bim
  • 612
  • 7
  • 18
  • 6
    Dataframes work as data structures with the same number of entries per row, a list based structure might be more efficient? – user5219763 Mar 07 '16 at 22:55
  • Yeah, either a list structure or a "long" dataframe, with string ID in one col and words in the second col. – Frank Mar 07 '16 at 22:59
  • For example, for the third sentence what is largest, `read.table` is creating one extra row, in total now for three sentence it is becoming 4 rows, what is not expected :( – bim Mar 07 '16 at 23:01
  • Aha, I see. Yes, it is working now. thanks @rawr – bim Mar 07 '16 at 23:04
  • Thank you very much guys, stackoverflow is really wonderful, discussing with you all really solved my problem in the shortest time span. :) – bim Mar 07 '16 at 23:07
  • Please post as answer, it is working wonderfully on big data set as well, I just tested. – bim Mar 07 '16 at 23:14
  • Okay, I am posting, but if you post as ans, would not it give you extra point, because I am new in stack, and I can not see an option to take your answer as the answer. – bim Mar 07 '16 at 23:18
  • Sorry, this way it is not working although there is no extra white space or tab. – bim Mar 09 '16 at 16:01

1 Answers1

2

The Solution from rawr,

sentence <- c("case sweden", "meeting minutes ht board meeting st march now also attachment added agenda today s board meeting", "draft meeting minutes board meeting final meeting minutes ht board meeting rd april")
dd <- read.table(text = paste(sentence, collapse = '\n'), fill = TRUE)
test <- cbind(sentence, dd)

Or,

cc <- read.table(text = paste(gsub('\n', '', sentence), collapse = '\n'), fill = TRUE)
test1 <- cbind(sentence, cc)

Thanks.

bim
  • 612
  • 7
  • 18