reformat pandas DataFrame containing tokens so each token on its own row

Question

Suppose I have the pandas DataFrame raw_corpus with columns unique_ID and 'tokenized_recipes' as follows:

unique_ID   tokenized_recipes
0   11530   ['photo', 'video', '500px', 'new', 'photo', 'from', 'anyone', 'tagged', 'with', 'phrase', 'change', 'new', 'tab', 'background', 'google', 'chrome', 'other']

1   17176   ['environment', 'control', 'monitoring', 'nest', 'protect', 'smoke', 'alarm', 'warning', 'activate', 'shortcut', 'wink', 'shortcuts', 'smart', 'hubs', 'systems']

2   6984    ['security', 'monitoring', 'systems', 'dlink', 'motion', 'sensor', 'motion', 'detected', 'post', 'to', 'channel', 'slack', 'communication']

I would like to reorganize this data and write it to a tab-delimited csv so it looks like this:

unique_ID   tokenized_recipes
11530       'photo'
11530       'video'
11530       '500px'
11530       'new'
 ...
17176       'environment'
17176       'control'
 ...

I tried 2 of the solutions linked above with 11 responses. I re-ordered the cols of my dataframe to correspond to the solution order.

My dataframe variable 'tokenized_recipes' is already a list.

The more complicated generic solution produces an error that I have a zero-dimensional array.

Then I attempt to explode the dataframe id_token with this code and get the NameError: name 'Series' is not defined.

#now explode the dataframe id_token string entry to separate rows

pd.concat([Series(row['unique_ID'], 
row['tokenized_recipes'].split(','))
      for _, row in id_token.iterrows()]).reset_index()

It would help a lot if you paste the code which you tried so far. — Igor Nikolaev, Jan 25 '18 at 22:10
Also is your source data in Pandas DataFrame format originally or is it something else? — Igor Nikolaev, Jan 25 '18 at 22:11
thanks for point out dupes. I searched and couldn't find it. I'll take a look. — profhoff, Jan 25 '18 at 22:27
Can you paste your code? From the question it's not clear whether you have to use DataFrame at all, or maybe the problem can be solved differently. — Igor Nikolaev, Jan 25 '18 at 22:30

reformat pandas DataFrame containing tokens so each token on its own row

0 Answers0