Melt group of columns into target columns by name

Question

This question builds on Pandas melt several groups of columns into multiple target columns by name. However, my data is organised in an annyoing way

import pandas as pd

df = pd.DataFrame([(101, 'a', 'b', 'c', 'd', 'e', 'f', 1, 2, 3, 4, 5, 6, 'aa', 'bb', 'cc', 'dd', 'ee', 'ff'),
                   (102,'g', 'h', 'i', 'j', 'k', 'l' , 7, 8, 9, 10, 11, 12, 'gg', 'hh', 'ii', 'jj', 'kk', 'll')], 
                   columns=['id','a__1', 'a__2', 'a__3', 'a_1', 'a_2', 'a_3','b__1', 'b__2', 'b__3', 'b_1', 'b_2', 'b_3','c__1', 'c__2', 'c__3', 'c_1', 'c_2', 'c_3'])

df

And the dataframe looks like this:

    id a__1 a__2 a__3 a_1 a_2 a_3  b__1  b__2  b__3  b_1  b_2  b_3 c__1 c__2  \
0  101    a    b    c   d   e   f     1     2     3    4    5    6   aa   bb   
1  102    g    h    i   j   k   l     7     8     9   10   11   12   gg   hh   

  c__3 c_1 c_2 c_3  
0   cc  dd  ee  ff  
1   ii  jj  kk  ll

As you can see instead of all the seperators only being one underscore they are sometimes two underscores.

In the end I would like to organise my data to be in the following way, note that the a__1 should come before a_1 and so on:

     id   a   b   c
0   101   a   1   aa
1   101   b   2   bb
2   101   c   3   cc
3   101   d   4   dd
4   101   e   5   ee
5   101   f   6   ff   
6   102   g   7   gg
7   102   h   8   hh
8   102   i   9   ii
9   102   j   10  jj
10  102   k   11  kk
11  102   l   12  ll

Basically my problem is that instead of only one underscore I sometimes have 2. Now this doesn't work with the pd.wide_to_long function since I cannot set the separator argument as '_' or '__'. I hope you understand my question.

Would a simple replace of `__` to `_` then proceed as you've mentioned work? — Jon Clements, Apr 30 '18 at 15:31

score 3 · Accepted Answer · answered Apr 30 '18 at 15:49

3

You could use @Wen old favorite pd.wide_to_long:

pd.wide_to_long(df,['a','b','c'],i='id',j='new',sep='_',suffix='.')\
  .reset_index('new', drop=True)\
  .sort_index()

Output:

     a   b   c
id            
101  a   1  aa
101  b   2  bb
101  c   3  cc
101  d   4  dd
101  e   5  ee
101  f   6  ff
102  g   7  gg
102  h   8  hh
102  i   9  ii
102  j  10  jj
102  k  11  kk
102  l  12  ll

answered Apr 30 '18 at 15:49

Scott Boston

147,308
15
139
187

Could I ask what the difference between this pd.wide_to_long is and the one in the question I linked? – Elias K. Apr 30 '18 at 15:54
1

There is no difference, just don't see why you would use pd.wide_to_long. The "New" column or "dropme" column isn't used anyway, so the single _ vs __ is an issue. Just use a single _ as the seperator and you are fine. – Scott Boston Apr 30 '18 at 15:57
One more question, what if I also had a column called b for example that has a value between b__3 and b_1 so something like 3.5. How would i change your method since now one of the column names is the same as the stub? – Elias K. Apr 30 '18 at 16:32
1

@EliasK. At that point, I would rename the columns to standardize then use the common seperator. IE. df.coumns = df.columns.str.replace('.','_') or something similiar that works as a separator. – Scott Boston Apr 30 '18 at 17:16

score 2 · Answer 2 · answered Apr 30 '18 at 15:36

You can using repalce reaplce all '_' and '__' to '', then we are using stack to reshape your df, after using cumcount create the unique key we can achieve what you need

df.columns=df.columns.str.replace('_','')
s=df.set_index('id').stack().reset_index(level=1)
s.level_1=s.level_1.str[0]
s['New']=s.groupby('level_1').cumcount()



s.set_index(['New','level_1'],append=True)[0].unstack()


Out[509]: 
level_1  a   b   c
id  New           
101 0    a   1  aa
    1    b   2  bb
    2    c   3  cc
    3    d   4  dd
    4    e   5  ee
    5    f   6  ff
102 6    g   7  gg
    7    h   8  hh
    8    i   9  ii
    9    j  10  jj
    10   k  11  kk
    11   l  12  ll

Melt group of columns into target columns by name

2 Answers2