0

I have a dataframe like so:

Input:

df = pd.DataFrame({'a': range(3), 'b': np.arange(3)-1})

Desired output:

df_rearranged = pd.DataFrame({'data': [0,1,2,-1,0,1], 'origin': ['a', 'a', 'a', 'b', 'b', 'b']})

I have found a (hacky) way of doing this:

Attempt:

subset_1 = df[['a']]
subset_1['origin'] = 'a'
subset_1.rename(columns={'a':'data'}, inplace=True)

subset_2 = df[['b']]
subset_2['origin'] = 'b'
subset_2.rename(columns={'b':'data'}, inplace=True)

df_rearranged = subset_1.append(subset_2)

This works, but it quickly becomes impractical when I want to pool larger numbers of columns. Also, I feel that there should be a function in pandas that does this by default, but I am lacking the keywords to find it. Help is greatly appreciated!

warped
  • 8,947
  • 3
  • 22
  • 49

1 Answers1

1

Use DataFrame.melt with change order columns by DataFrame.reindex:

df1 = df.melt(var_name='origin', value_name='data').reindex(['data','origin'], axis=1)
print (df1)
   data origin
0     0      a
1     1      a
2     2      a
3    -1      b
4     0      b
5     1      b

Or DataFrame constructor with numpy.ravel and numpy.repeat, obviously working with better performance:

df1 = pd.DataFrame({'data':df.values.ravel(), 'origin':np.repeat(df.columns, len(df))})
print (df1)
   data origin
0     0      a
1    -1      a
2     1      a
3     0      b
4     2      b
5     1      b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252