2

I have a df with lots of columns :

   date  risk lev   chemical  weight    date    risk lev   chemical  weight
 15-5-16   5         Potasium   5mg    15-5-16      3       Sodium     7 mg 
 14-5-16   6         Potasium   10mg   14-5-16      2       Sodium     2 mg  

I would like to re-arrange it so that each 4 columns breaks into a new row , such as the df is like this:

   date  risk lev   chemical  weight
 15-5-16      5      Potasium   5mg   
 15-5-16      3       Sodium    7mg  
 14-5-16      6      Potasium   10mg   
 14-5-16      2       Sodium    2mg

Sorry that I do not include my try but is the first time I counter this issue and not sure how to proceed

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
JamesHudson81
  • 2,215
  • 4
  • 23
  • 42

3 Answers3

2

first get rid of duplicates in column names:

In [248]: df
Out[248]:
      date  risk lev  chemical weight   date.1  risk lev.1 chemical.1 weight.1
0  15-5-16         5  Potasium    5mg  15-5-16           3     Sodium     7 mg
1  14-5-16         6  Potasium   10mg  14-5-16           2     Sodium     2 mg

now we can use pd.lreshape

In [249]: d = {
     ...:     'chemical': ['chemical','chemical.1'],
     ...:     'weight':['weight','weight.1'],
     ...:     'date':['date','date.1'],
     ...:     'risk lev': ['risk lev','risk lev.1']
     ...: }

In [250]: pd.lreshape(df, d)
Out[250]:
   chemical weight     date  risk lev
0  Potasium    5mg  15-5-16         5
1  Potasium   10mg  14-5-16         6
2    Sodium   7 mg  15-5-16         3
3    Sodium   2 mg  14-5-16         2
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

First remove duplicates with column names and then use pd.lreshape with dict created by dict comprehension:

s = df.columns.to_series()
df.columns = s.add(s.groupby(s).cumcount().astype(str))
print (df)
     date0  risk lev0 chemical0 weight0    date1  risk lev1 chemical1 weight1
0  15-5-16          5  Potasium     5mg  15-5-16          3    Sodium     7mg
1  14-5-16          6  Potasium    10mg  14-5-16          2    Sodium     2mg


cols = ['date','risk lev','chemical','weight']
d = {x:df.columns[df.columns.str.startswith(x)].tolist() for x in cols}
print (d)
{'date': ['date0', 'date1'], 
 'weight': ['weight0', 'weight1'], 
 'risk lev': ['risk lev0', 'risk lev1'], 
 'chemical': ['chemical0', 'chemical1']}

df = pd.lreshape(df, d)
print (df)
      date weight  risk lev  chemical
0  15-5-16    5mg         5  Potasium
1  14-5-16   10mg         6  Potasium
2  15-5-16    7mg         3    Sodium
3  14-5-16    2mg         2    Sodium
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

This problem is more robustly solved with pd.wide_to_long. You must place a digit at the end of each column first.

df.columns = [col + str(i//4 + 1) for i, col in enumerate(df.columns)]

pd.wide_to_long(df.reset_index(), 
                stubnames=['date', 'risk lev', 'chemical', 'weight'], 
                i='index', 
                j='dropme').reset_index(drop=True)

      date  risk lev  chemical weight
0  15-5-16         5  Potasium    5mg
1  14-5-16         6  Potasium   10mg
2  15-5-16         3    Sodium    7mg
3  14-5-16         2    Sodium    2mg
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136