1

Very new to Pandas and probably been answered somewhere but I can't seem to find exactly what I'm looking for. Assuming my dataset has this type of structure

Animal |  Age  |  Color  | Length

Cat       1       Brown       50cm
Cat       2       White       60cm
Cat       3       Brown       55cm
Dog       1       White       99cm
Dog       2       White       129cm
Dog       3       White       105cm

How can I most easily transform it to this format where the existing columns are appended horizontally rather than being ordered vertical for a specific animal

Animal |  Age_1  |  Color_1  | Length_1 |  Age_2 | Color_2 | Length_2 | Age_3 | Color_3 | Length_3

Cat       1        Brown       50cm         2       White      60cm       3       Brown      55cm
Dog       1        White       99cm         2       White      129cm      3       White      105cm

Maybe not the best example labels to use, but hopefully gets the point across I greatly appreciate links to answers too.

user21398
  • 419
  • 6
  • 13

1 Answers1

2

Create MultiIndex by GroupBy.cumcount and DataFrame.set_index, then reshape by DataFrame.unstack and sortinf second level of MultiIndex in columns, then flatten it with f-strings and convert index to column:

df1 = (df.set_index(['Animal', df.groupby('Animal').cumcount().add(1)])
         .unstack()
         .sort_index(axis=1, level=1))
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
df1 = df1.reset_index()
print (df1)
  Animal  Age_1 Color_1 Length_1  Age_2 Color_2 Length_2  Age_3 Color_3  \
0    Cat      1   Brown     50cm      2   White     60cm      3   Brown   
1    Dog      1   White     99cm      2   White    129cm      3   White   

  Length_3  
0     55cm  
1    105cm  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I never find a nice way to method chain 'flatten multiindex into a single one'. Any suggestion? – Mark Wang May 19 '20 at 13:13
  • @MarkWang - The best should be if support `rename`, but it working with all columns names separately... – jezrael May 19 '20 at 13:15
  • 1
    Also expand single header to multindex index header. It used to work with `rename` in the order version of pandas, now it just gives you a tuple.. – Mark Wang May 19 '20 at 13:15
  • Exactly. Totally agree – Mark Wang May 19 '20 at 13:15
  • @MarkWang - [link](https://github.com/pandas-dev/pandas/issues/34254) – jezrael May 19 '20 at 13:40
  • 1
    Awesome, though multiindex to_flat_index only gives you tuple. But I guess you can chain it with `df.rename(columns='{0[0]}_{0[1]}'.format)` to flatten the tuple – Mark Wang May 19 '20 at 13:46
  • @MarkWang - I got `TypeError: 'int' object is not subscriptable` – jezrael May 19 '20 at 13:49
  • 1
    Try the following? `index = pd.MultiIndex.from_product( [['foo', 'bar'], ['baz', 'qux']],names=['a', 'b']) pd.Series([1,2,3,3], index=index.to_flat_index()).rename('{0[0]}_{0[1]}'.format)` – Mark Wang May 19 '20 at 13:50
  • ah okay. I can't think of anything but `pipe` and `set_axis` combination. – Mark Wang May 19 '20 at 13:54