Merge rows with same index and prioritize column values

Question

I have a dataframe with some duplicate index values with columns containing values for two different experiments. I want to prioritize Col_A if values are present across both index instances. I am working to solve this solution using the following algorithm.

Merge rows with the same index

Pandas merging rows with the same value and same index

Create new column with combine_first function. to select Col_A if present otherwise select Col_B

Example Data

data = {'id':['id3', 'id3', 'id6'],
       'Col_A':[11,NaN,3],
       'Col_B':[NaN,5,NaN]}

## Insert SO Magic Here

##

output = {'id':['id3', 'id6'],
       'Col_Score':[11,3]}

If there is a "better" solution (shorter) than my proposed algorithm, please let me know.

score 2 · Accepted Answer · answered Aug 11 '21 at 18:59

If you’re guaranteed to not have duplicate columns per id, then the data (or rather pd.DataFrame(data)) can easily be reformatted as such:

>>> ser = data.set_index('id').stack()
>>> ser
id        
id3  Col_A    11.0
     Col_B     5.0
id6  Col_A     3.0
dtype: float64

As a side note, if you unstack it again, you get a more dense version o your original data with a unique index:

>>> ser.unstack()
     Col_A  Col_B
id               
id3   11.0    5.0
id6    3.0    NaN

We can select the first item with a groupby rather than .unstack(), for example:

>>> ser.groupby('id').first().rename('Col_score')
id
id3    11.0
id6     3.0
Name: Col_Score, dtype: float64

You can then .reset_index() onto that to get a dataframe instead of a series.

score 1 · Answer 2 · answered Aug 11 '21 at 19:17

Here is a solution using melt:

(df.melt(id_vars=['id'], value_name='Col_Score') # Col_A will be above Col_B
   .dropna()                                     # remove NaN rows
   .groupby('variable', as_index=False)
   .first()                                      # keep first per group (i.e. Col_A when both)
   .drop('variable', axis=1)                     # cleanup
)

output:

    id  Col_Score
0  id3       11.0
1  id3        5.0

Merge rows with same index and prioritize column values

2 Answers2