How to remove square brackets in result pos_tag

Question

I want to extract nouns from dataframe. I do as below

import pandas as pd
import nltk
from nltk.tag import pos_tag
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']})
noun=[]
for index, row in df.iterrows():
    noun.append([word for word,pos in pos_tag(row) if pos == 'NN'])
df['noun'] = noun

and i get df['noun']

0     [noun]
1    [Alice]
2         []
3         []
4     [city]

I use regex

df['noun'].replace('[^a-zA-Z0-9]', '', regex = True)

and again

0     [noun]
1    [Alice]
2         []
3         []
4     [city]
Name: noun, dtype: object

what's wrong?

score 2 · Accepted Answer · answered Sep 06 '16 at 12:52

2

The bracket means you have lists in each cell of the data frame. If you are sure there is only one element at most in each list, you can use str on the noun column and extract the first element:

df['noun'] = df.noun.str[0]

df
#    pos    noun
#0  noun    noun
#1  Alice   Alice
#2  good    NaN
#3  well    NaN
#4  city    city

answered Sep 06 '16 at 12:52

Psidom

209,562
33
339
356

what if there are multiple elements? – StatguyUser May 30 '17 at 13:36

How to remove square brackets in result pos_tag

1 Answers1