-1

I would like to transform a Pandas DataFrame of the following wide format

df = pd.DataFrame([['A', '1', '2', '3'], ['B', '4', '5', '6'], ['C', '7', '8', '9']], columns=['ABC', 'def', 'ghi', 'jkl'])

df =
   ABC  def  ghi  jkl
0    A    1    2    3
1    B    4    5    6
2    C    7    8    9

into a long format, where the values from the first column still correspond to the values in the lower-case columns. The column names cannot be used as stub names. The names of the new columns are irrelevant and could be renamed later.

The output should look something like this:

df =
   0  1
0  A  1
1  A  2
2  A  3
3  B  4
4  B  5
5  B  6
6  C  7
7  C  8
8  C  9

I am not sure how to best and efficiently do this. Can this be done with wide_to_long()? Then I would not know how to deal with stub names. The best would be an efficient one-liner that can be used on a large table.

Many thanks!!

Fhtsm
  • 147
  • 6

2 Answers2

2

Use DataFrame.melt with DataFrame.sort_index and remove variable column:

df1 = (df.melt("ABC", value_name='new', ignore_index=False)
          .sort_index(ignore_index=True)
          .drop('variable', axis=1)
         )
print (df1)
  ABC new
0   A   1
1   A   2
2   A   3
3   B   4
4   B   5
5   B   6
6   C   7
7   C   8
8   C   9

If need more dynamic solution with generate first value of columns names:

first = df.columns[0]
df1 = (df.melt(first, value_name='new', ignore_index=False)
         .sort_index(ignore_index=True)
         .drop('variable', axis=1))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can use df.stack:

>>> df.set_index('ABC') \
      .stack() \
      .reset_index(level='ABC') \
      .reset_index(drop=True)

  ABC  0
0   A  1
1   A  2
2   A  3
3   B  4
4   B  5
5   B  6
6   C  7
7   C  8
8   C  9

or use df.melt as suggested by @MustafaAydın:

>>> df.melt('ABC') \
      .sort_values('ABC') \
      .drop(columns='variable') \
      .reset_index(drop=True)

  ABC value
0   A     1
1   A     2
2   A     3
3   B     4
4   B     5
5   B     6
6   C     7
7   C     8
8   C     9
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Many thanks! df.stack worked great (keeping the order of the columns). df.melt does not necessarily preserve the order if you have duplicate values in the ABC column I believe. – Fhtsm Jun 29 '21 at 09:20