0

There's a dataframe with three columns:

    ID    col1    col2
0    item_1    abc    NaN
1    item_2    bcd    NaN
2    item_3    NaN    NaN
3    item_4    mnb    lkj

I would like to combine col1 and col2 like this:

    ID    col1
    item_1    abc
    item_2    bcd
    item_3    NaN
    item_4    mnb
    item_4    lkj

ID is not index but a column.

I tried with stack().reset_index() but this is not what I want.

Any ideas?

New Dev
  • 48,427
  • 12
  • 87
  • 129
plnnvkv
  • 541
  • 4
  • 14

2 Answers2

1

This is the way to do it with stack.. the way you were trying it before -

  1. Step 1 - df.stack() only the col1 and col 2 (and drop the Nans as well), then keep only the integer with reset_index() index that will be used to merge it in the next step
  2. Step 2 - pd.merge() the initial DataFrame with the stacked one on their index
  3. Step 3 - DONE!
a = pd.DataFrame(df[['col1','col2']].stack(dropna=True),columns=['col1']).reset_index(level=1, drop=True)
pd.merge(df[['ID']],a,how='left',left_index=True, right_index=True)
    ID  col1
0   item_1  abc
1   item_2  bcd
2   item_3  NaN
3   item_4  mnb
3   item_4  lkj

Do update the correct answer in case you find this easier to understand, for anyone finding similar solutions. Cheers!

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
0

Here's one way to do that (in multiple steps, for clarity):

df.loc[df.col1.isna() & df.col2.isna(), "keep_as_na"] = True
df = df.melt(id_vars="ID").dropna()
df.loc[df.variable == "keep_as_na", "value"] = np.NaN
df.drop("variable", axis = 1).sort_values("ID")

The output is:

        ID value
0   item_1   abc
1   item_2   bcd
10  item_3   NaN
3   item_4   mnb
7   item_4   lkj
Roy2012
  • 11,755
  • 2
  • 22
  • 35