1

I have a dataframe where I've removed unwanted values and replaced with NaN. I now want to consolidate my columns in to one column that contains the valid values

Name    Set_1   Set_2   Set_3
Task1   A       NaN     NaN 
Task2   NaN     B       NaN 
Task3   NaN     NaN     C

I don't want to drop entire rows containing NaN as they contain valid values, rather I want to drop the NaN values and consolidate the data so it looks like this:

Name    Set
Task1   A
Task2   B
Task3   C

I can do this manually on each column by locating the columns with valid values, creating a new dataframe and then dropping the columns without values:

df2 = df.loc[df['Set_1'] != 'NaN']
df2.drop(columns = ["Set_2", "Set_3"])

and then combining each new dataframe, but this is not really practical in a larger scenario. I'm looking for a more efficient way to do this as my real dataset has a lot more columns.

ka4c
  • 89
  • 1
  • 10

2 Answers2

1

You can use melt + groupby/first:

(df.melt(id_vars='Name', value_name='Set')
   .groupby('Name')
   ['Set'].first()
   .reset_index()
)

output:

    Name Set
0  Task1   A
1  Task2   B
2  Task3   C
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Try with bfill:

>>> df.bfill(axis=1).dropna(axis=1)
    Name Set_1
0  Task1     A
1  Task2     B
2  Task3     C
not_speshal
  • 22,093
  • 2
  • 15
  • 30