1

I have a pandas.DataFrame named my_df:

enter image description here

I was trying to "ungroup" list in tag column into multiple rows, using help from this answer.

However when I try this code (same as in answer from provided link):

my_df.reset_index(inplace=True, drop=True)

rows = []
my_df.apply(lambda row: [rows.append([row['q_id'], row['body'], t]) 
                               for t in row.tag], axis=1)

df_new = pd.DataFrame(rows, columns=my_df.columns)

I get an error: ValueError: could not broadcast input array from shape (2) into shape (3) which is triggered because of this line (number 9):

enter image description here

What am I doing wrong?

Community
  • 1
  • 1
PeterB
  • 2,234
  • 6
  • 24
  • 43

2 Answers2

2

I've made a change to your code and it works now. The problem is in your code when you use apply function, it tries to construct a new dataframe from the return of your lambda function. In this case, you lambda returns different number of columns for each row because your tag has different number of words in it. Adding a len() will force your lambda always return 1 column which will workaround this problem.

rows = []
my_df = pd.DataFrame(data={'q_id':range(5),'body':['abc']*5, 
                        'tag':[['pset1','maria','check50'],
                               ['maria', 'pset1'],
                               ['greedy','pset1'],
                               ['pset'],
                               ['pset']]})
my_df.apply(lambda row: len([rows.append([row['q_id'], row['body'], t]) for t in row.tag]), axis=1)
df_new = pd.DataFrame(rows, columns=my_df.columns)
df_new
Out[63]: 
   body q_id      tag
0     0  abc    pset1
1     0  abc    maria
2     0  abc  check50
3     1  abc    maria
4     1  abc    pset1
5     2  abc   greedy
6     2  abc    pset1
7     3  abc     pset
8     4  abc     pset
Allen Qin
  • 19,507
  • 8
  • 51
  • 67
1

Do you mind if I present you a different approach?

Generate data:

my_df = pd.DataFrame({'q_id':[1036.0,1039.0,1089.0,1103.0,1125.0],
                      'body':['Mario Pyramid - Check 50',
                              "What's wrong wth my code?",
                              'Why do I get errors',
                              'How does a person make',
                              'How do I fix'],
                      'tag':[['pest1','mario','check50'],
                             ['mario','pset1'],
                             ['greedy','pset1'],
                             ['pset1'],
                             ['pset1']]})

Do "ungrouping":

df_new = pd.merge(my_df,
        (my_df['tag'].apply(lambda x: pd.Series(x)).T
             .unstack().reset_index(level=-1, drop=True)
             .dropna().to_frame()),
        left_index = True,
        right_index = True).drop('tag', axis=1)

df_new = df_new.rename(columns={0:'tag'})[['q_id','body','tag']]

print(df_new)

Output:

     q_id                       body      tag
0  1036.0   Mario Pyramid - Check 50    pest1
0  1036.0   Mario Pyramid - Check 50    mario
0  1036.0   Mario Pyramid - Check 50  check50
1  1039.0  What's wrong wth my code?    mario
1  1039.0  What's wrong wth my code?    pset1
2  1089.0        Why do I get errors   greedy
2  1089.0        Why do I get errors    pset1
3  1103.0     How does a person make    pset1
4  1125.0               How do I fix    pset1
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Thanks. Maybe a little bit more complicated but it works. However I am still curious, why my approach doesn't work when it is exactly same as in mentioned answer. – PeterB Apr 23 '17 at 20:42