I'm using SpaCy
with Pandas
to get a sentence tokenised with Part of Speech (POS)export to excel. The code is as follow:
import spacy
import xlsxwriter
import pandas as pd
nlp = spacy.load('en_core_web_sm')
text ="""He is a good boy."""
doc = nlp(text)
for token in doc:
x=[token.text, token.lemma_, token.pos_, token.tag_,token.dep_,token.shape_, token.is_alpha, token.is_stop]
print(x)
When I print(x)
I get the following:
['He', '-PRON-', 'PRON', 'PRP', 'nsubj', 'Xx', True, False]
['is', 'be', 'VERB', 'VBZ', 'ROOT', 'xx', True, True]
['a', 'a', 'DET', 'DT', 'det', 'x', True, True]
['good', 'good', 'ADJ', 'JJ', 'amod', 'xxxx', True, False]
['boy', 'boy', 'NOUN', 'NN', 'attr', 'xxx', True, False]
['.', '.', 'PUNCT', '.', 'punct', '.', False, False]
To the token loop, I added the DataFrame as follow: for token in doc:
for token in doc:
x=[token.text, token.lemma_, token.pos_, token.tag_,token.dep_,token.shape_, token.is_alpha, token.is_stop]
df=pd.Dataframe(x)
print(df)
Now, I stat to get the following format:
0
0 He
1 -PRON-
2 PRON
3 PRP
4 nsubj
5 Xx
6 True
7 False
........
........
However, when I try exporting the output (df) to excel using Pandas
as the following code, it only shows me the last iteration of x in the column
df=pd.DataFrame(x)
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
df.to_excel(writer,sheet_name='Sheet1')
Output (in Excel Sheet):
0
0 .
1 .
2 PUNCT
3 .
4 punct
5 .
6 False
7 False
How I can have all the iterations one after the other in the new column in this scenario as follow?
0 He is ….
1 -PRON- be ….
2 PRON VERB ….
3 PRP VBZ ….
4 nsubj ROOT ….
5 Xx xx ….
6 True True ….
7 False True ….