I have a large pandas dataframe with a lot of documents:
id text
1 doc2 Google i...
2 doc3 Amazon...
3 doc4 This was...
...
n docN nice camara...
How can I stack all the documents into sentences carrying out their respective id?:
id text
1 doc1 Google is a great company.
2 doc1 It is in silicon valley.
3 doc1 Their search engine is the best
4 doc2 Amazon is a great store.
5 doc2 it is located in Seattle.
6 doc2 its new product is alexa.
5 doc2 its expensive.
5 doc3 This was a great product.
...
n docN nice camara I really liked it.
I tried to:
import nltk
def sentence(document):
sentences = nltk.sent_tokenize(document.strip(' '))
return sentences
df['sentece'] = df['text'].apply(sentence)
df.stack(level=0)
However, it did not worked. Any idea of how to stack the sentences carrying out their id of provenance?.