I have data frames in R which can be reproduced with this code:
id1 <- c("NP", "AK", "HT")
id2 <- c("t1", "t5", "t2")
Sentence <- c("This is an example .", "This too !", "Ok")
df <- data.frame(id1, id2, Sentence)
It looks like this:
id1 id2 Sentence
1 NP t1 This is an example .
2 AK t5 This too !
3 HT t2 Ok
And I would like to restructure it into something like this, where each unit in Sentence column is divided by the spaces:
id1 id2 Sentence
1 NP t1 This
2 NP t1 is
3 NP t1 an
4 NP t1 example
5 NP t1 .
6 AK t5 This
7 AK t5 too
8 AK t5 !
9 HT t2 Ok
I know there is the function strsplit, then package tm seems to have also function called tokenizer, but I don't really understand how I can do something like this inside a data frame.
Thank you!