0

How do I combine a lot of columns in Pandas?

I Have 20 Columns that i want to turn in just one

Exemple above w/ 4 Columns:

ID Type AB1 AB2 AB3 AB4
0 True AA CC NAN ZZ
1 False BB NAN HH JU

I Want to Turn it in:

ID Type AB
0 True AA
1 False BB
2 True CC
3 False HH
4 True ZZ
5 False JU

Does anyone has a tip?

Need a way to preserve the "Type"

Thank you!

marceloasr
  • 343
  • 1
  • 3
  • 11

2 Answers2

0

You can try something like that:

>>> df.melt('ID', value_name='AB').dropna().rename_axis('ID')['AB'].reset_index()

   ID  AB
0   0  AA
1   1  BB
2   2  CC
3   5  HH
4   6  ZZ
5   7  JU
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • 1
    Please don't answer duplicate questions, instead [vote to close](https://stackoverflow.com/help/privileges/close-questions) and/or close them, as you can do with both Python and pandas gold badges. – BigBen Dec 01 '21 at 21:44
  • @BigBen. Many many question could be closed because `melt` is often the right answer to a answer which need to flat a dataframe... The ugly part of this problem is probably `.dropna().rename_axis('ID')['AB'].reset_index()`. Don't you think so? – Corralien Dec 01 '21 at 21:49
  • 1
    Possibly, but that could easily be a comment to OP. If the answer is a one-liner, or generally short, when I close questions, I normally leave the one-liner answer as a comment. This is helpful to OP, but also promotes the linked duplicate, which is much more valuable to the site as a whole. It also promotes [teaching to fish over giving a fish](https://en.wiktionary.org/wiki/give_a_man_a_fish_and_you_feed_him_for_a_day;_teach_a_man_to_fish_and_you_feed_him_for_a_lifetime). – BigBen Dec 01 '21 at 21:51
0

There's some different ways to tackle it but you can primarily use the stack method:

df = pd.DataFrame({"id": [0, 1], "a": [1,2], "b": [3, 4], "c": [5, 6], "d": [7, 8]})
stack_columns = ["a", "b", "c", "d"]
# Assuming you care about the id
stacked_df = df.set_index("id").stack().reset_index()
stacked_df.drop("level_1", axis=1, inplace=True)
stacked_df.rename(columns={0: "".join(stack_columns)}, inplace=True)

# Assuming you don't care and want a new id
stacked_df_2 = df.loc[:, stack_columns].stack().reset_index()
stacked_df_2.drop(["level_0", "level_1"], axis=1, inplace=True)
stacked_df_2.rename(columns={0: "".join(stack_columns)}, inplace=True)