1

I am trying to populate a new column within a pandas dataframe by using values from several columns. The original columns are either 0 or '1' with exactly a single 1 per series. The new column would correspond to df['A','B','C','D'] by populating new_col = [1, 3, 7, 10] as shown below. (A 1 at A means new_col = 1; if B=1,new_col = 3, etc.)

df    
         A    B    C    D
1        1    0    0    0
2        0    0    1    0
3        0    0    0    1
4        0    1    0    0

The new df should look like this.

df    
         A    B    C    D   new_col
1        1    0    0    0         1
2        0    0    1    0         7
3        0    0    0    1        10
4        0    1    0    0         3

I've tried to use map, loc, and where but can't seem to formulate an efficient way to get it done. Problem seems very close to this. A couple other posts I've looked at 1 2 3. None of these show how to use multiple columns conditionally to fill a new column based on a list.

Community
  • 1
  • 1

2 Answers2

2

I can think of a few ways, mostly involving argmax or idxmax, to get either an ndarray or a Series which we can use to fill the column.

We could drop down to numpy, find the maximum locations (where the 1s are) and use those to index into an array version of new_col:

In [148]: np.take(new_col,np.argmax(df.values,1))
Out[148]: array([ 1,  7, 10,  3])

We could make a Series with new_col as the values and the columns as the index, and index into that with idxmax:

In [116]: pd.Series(new_col, index=df.columns).loc[df.idxmax(1)].values
Out[116]: array([ 1,  7, 10,  3])

We could use get_indexer to turn the column idxmax results into integer offsets we can use with new_col:

In [117]: np.array(new_col)[df.columns.get_indexer(df.idxmax(axis=1))]
Out[117]: array([ 1,  7, 10,  3])

Or (and this seems very wasteful) we could make a new frame with the new columns and use idxmax directly:

In [118]: pd.DataFrame(df.values, columns=new_col).idxmax(1)
Out[118]: 
0     1
1     7
2    10
3     3
dtype: int64
DSM
  • 342,061
  • 65
  • 592
  • 494
0

It's not the most elegant solution, but for me it beats the if/elif/elif loop:

d = {'A': 1, 'B': 3, 'C': 7, 'D': 10}
def new_col(row):
    k = row[row == 1].index.tolist()[0]
    return d[k]

df['new_col'] = df.apply(new_col, axis=1)

Output:

    A   B   C   D   new_col
1   1   0   0   0   1
2   0   0   1   0   7
3   0   0   0   1   10
4   0   1   0   0   3
shawnheide
  • 797
  • 4
  • 11