I am trying to reproduce in Python the exploding
tokenization of tidytext
> tibble(text = c('hasta la vista baby',
+ 'I am the terminator'),
+ value = c(1,2)) %>%
+ unnest_tokens(input = 'text',output = 'word', token = 'words')
# A tibble: 8 x 2
value word
<dbl> <chr>
1 1 hasta
2 1 la
3 1 vista
4 1 baby
5 2 i
6 2 am
7 2 the
8 2 terminator
Is it possible to do so in Pandas
as well? I am focusing on speed of execution here.
import pandas as pd
pd.DataFrame({'text': ['hasta la vista baby', 'I am the terminator'],
'value': [1,2]})
Out[3]:
text value
0 hasta la vista baby 1
1 I am the terminator 2
Thanks!