0

I have a dataframe with some duplicate index values with columns containing values for two different experiments. I want to prioritize Col_A if values are present across both index instances. I am working to solve this solution using the following algorithm.

  1. Merge rows with the same index

Pandas merging rows with the same value and same index

  1. Create new column with combine_first function. to select Col_A if present otherwise select Col_B

Example Data

data = {'id':['id3', 'id3', 'id6'],
       'Col_A':[11,NaN,3],
       'Col_B':[NaN,5,NaN]}

## Insert SO Magic Here

##

output = {'id':['id3', 'id6'],
       'Col_Score':[11,3]}

If there is a "better" solution (shorter) than my proposed algorithm, please let me know.

Cody Glickman
  • 514
  • 1
  • 8
  • 30

2 Answers2

2

If you’re guaranteed to not have duplicate columns per id, then the data (or rather pd.DataFrame(data)) can easily be reformatted as such:

>>> ser = data.set_index('id').stack()
>>> ser
id        
id3  Col_A    11.0
     Col_B     5.0
id6  Col_A     3.0
dtype: float64

As a side note, if you unstack it again, you get a more dense version o your original data with a unique index:

>>> ser.unstack()
     Col_A  Col_B
id               
id3   11.0    5.0
id6    3.0    NaN

We can select the first item with a groupby rather than .unstack(), for example:

>>> ser.groupby('id').first().rename('Col_score')
id
id3    11.0
id6     3.0
Name: Col_Score, dtype: float64

You can then .reset_index() onto that to get a dataframe instead of a series.

Cimbali
  • 11,012
  • 1
  • 39
  • 68
1

Here is a solution using melt:

(df.melt(id_vars=['id'], value_name='Col_Score') # Col_A will be above Col_B
   .dropna()                                     # remove NaN rows
   .groupby('variable', as_index=False)
   .first()                                      # keep first per group (i.e. Col_A when both)
   .drop('variable', axis=1)                     # cleanup
)

output:

    id  Col_Score
0  id3       11.0
1  id3        5.0
mozway
  • 194,879
  • 13
  • 39
  • 75