2

I have a need to melt groups of initial columns into multiple target columns in a dataset that is not normalized well. Here is an example (from this question pandas dataframe reshaping/stacking of multiple value variables into seperate columns):

         des1 des2 des3 interval1 interval2 interval3
value   
aaa       a    b    c     ##1         ##2       ##3
bbb       d    e    f     ##4         ##5       ##6
ccc       g    h    i     ##7         ##8       ##9

I am trying to melt this into something like this orientation:

         des      interval
value   
aaa       a         ##1
aaa       b         ##2
aaa       c         ##3
bbb       d         ##4
bbb       e         ##5
bbb       f         ##6
ccc       g         ##7
ccc       h         ##8
ccc       i         ##9

I was hoping to use melt instead of stack to avoid manually subsetting a lot of data. Here is what I have started out with thus far:

import pandas as pd
import numpy as np
import fnmatch

column_list = list(df_initial.columns.values)

question_sources = [c for c in fnmatch.filter(column_list, "measure*question*source")]     
question_ranks = [c for c in fnmatch.filter(column_list, "measure*rank")]
question_targets = [c for c in fnmatch.filter(column_list, "measure*targeted")]
question_statuses = [c for c in fnmatch.filter(column_list, "measure*status")]

place = [c for c in fnmatch.filter(column_list, "place")]
measure_statuses = [c for c in fnmatch.filter(column_list, "measureInfo_status")]

starter_list = place + measure_statuses

df_gpro_melt_1 = (pd.melt(df_initial, id_vars=starter_list,      
                    value_vars=question_sources, var_name="question_sources", 
                    value_name="question_sources_values"))      

Is it possible to melt groups of initial columns into multiple target columns? Any advice is much appreciated.

Community
  • 1
  • 1
Pylander
  • 1,531
  • 1
  • 17
  • 36

3 Answers3

3

I know this has been answered already, but:

>>> df
      des1 des2 des3 interval1 interval2 interval3
value                                             
aaa      a    b    c       ##1       ##2       ##3
bbb      d    e    f       ##4       ##5       ##6
ccc      g    h    i       ##7       ##8       ##9

>>> pd.wide_to_long(df.reset_index(), ['des', 'interval'], i='value', j='id')
         des interval
value id             
aaa   1    a      ##1
bbb   1    d      ##4
ccc   1    g      ##7
aaa   2    b      ##2
bbb   2    e      ##5
ccc   2    h      ##8
aaa   3    c      ##3
bbb   3    f      ##6
ccc   3    i      ##9

Then just use .reset_index(level=1, drop=True) if you want to get rid of the id column.

kcdgkn
  • 96
  • 6
2

This should work for your example, if your columns follow the pattern in your example data frame:

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                         'interval':df.iloc[:,i+3]}) 
             for i in range(3)))

If the pairs are different, you can use this pattern, but iterate through a list

tuples = [(0,3),(1,4),(2,5)]

pd.concat((pd.DataFrame({'des':df.iloc[:,i], 
                          'interval':df.iloc[:,j]}) 
             for i,j in tuples))
maxymoo
  • 35,286
  • 11
  • 92
  • 119
0

I guess I found an ugly way to do that!

In [12]: pd.DataFrame(
             data={'desc': df.values[..., 0:3].ravel(),
                   'interval':df.values[..., 3:6].ravel()},
             index = pd.np.ravel([[i]*3 for i in df.index]))
Out[12]: 
    desc interval
aaa    a      ##1
aaa    b      ##2
aaa    c      ##3
bbb    d      ##4
bbb    e      ##5
bbb    f      ##6
ccc    g      ##7
ccc    h      ##8
ccc    i      ##9

But i'm pretty sure there is more elegant way using some other functions like pandas.MultiIndex (to group your interval1, interval2 and interval3 columns in an "interval" levels) and/or pandas.melt (or maybe the stack method)

mgc
  • 5,223
  • 1
  • 24
  • 37