You can first need convert string
column to list
, I use ast.literal_eval
. Then make flat list of lists by list comprehension, use set
and last create new DataFrame
by constructor:
import ast
print (type(df.ix[0, 'Description']))
<class 'str'>
df.Description = df.Description.apply(ast.literal_eval)
print (type(df.ix[0, 'Description']))
<class 'list'>
#http://stackoverflow.com/q/952914/2901002
unique_data = list(set([item for sublist in df.Description.tolist() for item in sublist]))
print (unique_data)
['refused', 'jumped', 'go', 'roof', 'come', 'beautiful',
'paris', 'york', 'lets', 'new', 'boy', 'party']
print (pd.DataFrame({'Unique Words': unique_data}))
Unique Words
0 refused
1 jumped
2 go
3 roof
4 come
5 beautiful
6 paris
7 york
8 lets
9 new
10 boy
11 party
Another solution without ast
:
df.Description = df.Description.str.strip('[]').str.split(',')
print (df)
Description
0 ['boy']
1 ['boy', 'jumped', 'roof']
2 ['paris']
3 ['paris', 'beautiful', 'new', 'york']
4 ['lets', 'go', 'party']
5 ['refused', 'come', 'party']
unique_data = list(set([item.strip().strip("'") for sublist in df.Description.tolist() for item in sublist]))
print (unique_data)
['refused', 'jumped', 'go', 'roof', 'come', 'beautiful',
'paris', 'york', 'lets', 'new', 'boy', 'party']
print (pd.DataFrame({'Unique Words': unique_data}))
Unique Words
0 refused
1 jumped
2 go
3 roof
4 come
5 beautiful
6 paris
7 york
8 lets
9 new
10 boy
11 party