0

I want to extract nouns from dataframe. I do as below

import pandas as pd
import nltk
from nltk.tag import pos_tag
df = pd.DataFrame({'pos': ['noun', 'Alice', 'good', 'well', 'city']})
noun=[]
for index, row in df.iterrows():
    noun.append([word for word,pos in pos_tag(row) if pos == 'NN'])
df['noun'] = noun  

and i get df['noun']

0     [noun]
1    [Alice]
2         []
3         []
4     [city]

I use regex

df['noun'].replace('[^a-zA-Z0-9]', '', regex = True)

and again

0     [noun]
1    [Alice]
2         []
3         []
4     [city]
Name: noun, dtype: object

what's wrong?

Edward
  • 4,443
  • 16
  • 46
  • 81

1 Answers1

2

The bracket means you have lists in each cell of the data frame. If you are sure there is only one element at most in each list, you can use str on the noun column and extract the first element:

df['noun'] = df.noun.str[0]

df
#    pos    noun
#0  noun    noun
#1  Alice   Alice
#2  good    NaN
#3  well    NaN
#4  city    city
Psidom
  • 209,562
  • 33
  • 339
  • 356