Stacking select columns as rows in pandas dataframe

Question

Suppose I have df_in below:

df_in = pd.DataFrame({'X': ['a', 'b', 'c'], 'A': [1, 0, 0], 'B': [1, 1, 0]})

df_in:

+---+---+---+---+
|   | X | A | B |
+---+---+---+---+
| 0 | a | 1 | 1 |
| 1 | b | 0 | 1 |
| 2 | c | 0 | 0 |
+---+---+---+---+

I want to achieve something like the following:

df_out = pd.DataFrame({'X': ['a', 'a', 'b'], 'Y': ['A', 'B', 'B']})

df_out:

+---+---+---+
|   | X | Y |
+---+---+---+
| 0 | a | A |
| 1 | a | B |
| 2 | b | B |
+---+---+---+

I also have a list containing the columns: l = list(['A', 'B']). The logic is, for each column in df_in that is in l, repeat those observations where the column value == 1, and add the column name to a new column in df_out, this is Y in the example. In reality there are more columns in df_in and not all of them are in l, which is why I want to solve this without explicit references to columns A, B and X.

NOTE: This is not entirely covered by this answer since, as stated above, there are many columns in reality, and these can be of any type and data, so the solution, df_out, needs to take into account all of the original columns (X in this case). In theory, X can also be a binary 0/1 column, but should only affect the outcome in the same way as A and B if it's included in l. I hope this helps clarify.

Question was originally asked [here](https://stackoverflow.com/questions/60166869/reshape-multiple-columns-into-rows) — Leland Hepworth, Sep 17 '20 at 14:11

score 2 · Accepted Answer · answered Feb 11 '20 at 11:08

Use Index.difference for all columns without l pass to DataFrame.set_index, reshape by DataFrame.stack, filter only 1 and last convert MultiIndex.to_frame to new DataFrame with rename last column:

l = ['A', 'B']

c = df_in.columns.difference(l, sort=False).tolist()
s = df_in.set_index(c).stack()
df_out = s[s == 1].index.to_frame(index=False).rename(columns={len(c):'Y'})
print (df_out)
   X  Y
0  a  A
1  a  B
2  b  B

Stacking select columns as rows in pandas dataframe

1 Answers1