3

Hi I'm trying to reshape a data frame in a certain way.

this is the data frame I have,

         des1 des2 des3 interval1 interval2 interval3
value   
aaa       a    b    c     ##1         ##2       ##3
bbb       d    e    f     ##4         ##5       ##6
ccc       g    h    i     ##7         ##8       ##9

des1 corresponds with interval1 and so on. interval columns have a range of dates and des columns have descriptions.

I'd like to reshape the dataframe such that it looks like this:

         des      interval
value   
aaa       a         ##1
aaa       b         ##2
aaa       c         ##3
bbb       d         ##4
bbb       e         ##5
bbb       f         ##6
ccc       g         ##7
ccc       h         ##8
ccc       i         ##9

How would I go about doing this? I'm a little familar with .stack() but I haven't been able to get exactly what I wanted.

Thank you for your help. feel free to post references.

chungkim271
  • 927
  • 1
  • 10
  • 20
  • This is a good question, that is not covered by http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html Since it deals with multiple columns that need to be stacked into two seperate columns. Perhaps the title should reflect that "pandas dataframe reshaping/stacking of multiple value variables into seperate columns" - – Timo Kvamme Aug 20 '15 at 12:59
  • per your recommendation, i changed the title. i hope people benefit from the helpful answers below. – chungkim271 Aug 20 '15 at 14:03

4 Answers4

2

This might be a shorter approach:

[72]:

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[:-1], x), df.columns))
In [73]:

print pd.DataFrame({key:df[key].stack().values for key in set(df.columns.get_level_values(0))},
                   index = df['des'].stack().index.get_level_values(0))
      des interval
value             
aaa     a      ##1
aaa     b      ##2
aaa     c      ##3
bbb     d      ##4
bbb     e      ##5
bbb     f      ##6
ccc     g      ##7
ccc     h      ##8
ccc     i      ##9

Or preserve the 1,2,3 info:

[73]:

df.columns = pd.MultiIndex.from_tuples(map(lambda x: (x[:-1], x[-1]), df.columns))
Keys = set(df.columns.get_level_values(0))
df2  = pd.concat([df[key].stack() for key in Keys], axis=1)
df2.columns = Keys
print df2
        des interval
value               
aaa   1   a      ##1
      2   b      ##2
      3   c      ##3
bbb   1   d      ##4
      2   e      ##5
      3   f      ##6
ccc   1   g      ##7
      2   h      ##8
      3   i      ##9
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
1

This is just a .melt, docs are here

In [33]: pd.melt(df.reset_index(),
                 id_vars=['values'],
                 value_vars=['interval1','interval2','interval3'])
Out[33]: 
  values   variable value
0    aaa  interval1   ##1
1    bbb  interval1   ##4
2    ccc  interval1   ##7
3    aaa  interval2   ##2
4    bbb  interval2   ##5
5    ccc  interval2   ##8
6    aaa  interval3   ##3
7    bbb  interval3   ##6
8    ccc  interval3   ##9
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • `.melt` only applies to aggregating a group of columns into one variable. The OP's example solution needs two. – oLas Dec 02 '16 at 12:12
1

I think the solution provided by CT Zhu is very genius. But you also can reshape this step by step (maybe this is the common way).

 d = {'des1' : ['', 'a', 'd', 'g'],
     'des2' : ['', 'b', 'e', 'h'],
     'des3' : ['', 'c', 'f', 'i'],
     'interval1' : ['', '##1', '##4', '##7'],
     'interval2' : ['', '##2', '##5', '##6'],
     'interval3' : ['', '##3', '##6', '##9']}

df = pd.DataFrame(d, index=['value', 'aaa', 'bbb', 'ccc'], 
                  columns=['des1', 'des2', 'des3', 'interval1', 'interval2', 'interval3'])

nd = {'des' : [''] + df.iloc[1, 0:3].tolist() + df.iloc[2, 0:3].tolist() + df.iloc[3, 0:3].tolist(),
      'interval' : ['']+ df.iloc[1, 3:6].tolist() + df.iloc[2, 3:6].tolist() + df.iloc[3, 3:6].tolist()}

ndf = pd.DataFrame(nd, index=['value', 'aaa', 'aaa', 'aaa', 'bbb', 'bbb', 'bbb', 'ccc', 'ccc', 'ccc'], columns=['des', 'interval'])
Community
  • 1
  • 1
Fei Yuan
  • 82
  • 5
1

This type of reshaping can be done conveniently with pandas.wide_to_long:

import io
import pandas as pd  # v 1.2.3

data = '''
value des1 des2 des3 interval1 interval2 interval3  
aaa  a  b  c ##1 ##2 ##3
bbb  d  e  f ##4 ##5 ##6
ccc  g  h  i ##7 ##8 ##9
'''
df = pd.read_csv(io.StringIO(data), index_col=0, delim_whitespace=True)

pd.wide_to_long(df.reset_index(), stubnames=['des', 'interval'],
                i='value', j='var_id').droplevel(1).sort_index()

wide_to_long

Patrick FitzGerald
  • 3,280
  • 2
  • 18
  • 30