0

I am trying to do some topic analysis and I need to collapse a column in dataframe made up of lists, into one list of words.

So here's an approximation of what my data looks like:

import pandas as pd
d = {'Case': ["[wait, information, employer]","[case, assign, priority, level, 2, transmit]" ]}
df2 = pd.DataFrame(data=d)

I would like to get one list like list = ['wait', 'information', 'case','assign','priority', 'level']

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
Ozymandias
  • 25
  • 4

2 Answers2

1

IIUIC you could do something like this to get a list from your column

import nltk
token = []
token = token.append(df['Case'].apply(lambda x: nltk.word_tokenize(x)))
slb20
  • 127
  • 1
  • 7
1
df2["CaseList"] = df2["Case"].apply(lambda x: x.replace("]","").replace("[","").split(","))

Does that work for you?

Matt Camp
  • 1,448
  • 3
  • 17
  • 38