My question is related to this past of question of mine: Split text in cells and create additional rows for the tokens.
Let's suppose that I have the following in a DataFrame
in pandas
:
id text
1 I am the first document and I am very happy.
2 Here is the second document and it likes playing tennis.
3 This is the third document and it looks very good today.
and I want to split the text of each id in tokens of random number of words (varying between two values e.g. 1 and 5) so I finally want to have something like the following:
id text
1 I am the
1 first document
1 and I am very
1 happy
2 Here is
2 the second document and it
2 likes playing
2 tennis
3 This is the third
3 document and
3 looks very
3 very good today
Keep in mind that my dataframe may also have other columns except for these two which should be simply copied at the new dataframe in the same way as id
above.
What is the most efficient way to do this?