1

Suppose I have a dataframe:

C1 V1 C2 V2 Cond
1  2  3  4  X  
5  6  7  8  Y  
9  10 11 12 X

The statements should return: if Cond == X, pick C1 and v1, else pick C2 and V2.

The output dataframe is something like:

C  V 
1  2 
7  8
9  10

** EDIT: To add one more requirement: the number of columns can change but follow some naming pattern. In this case select all columns with "1" in it, else with "2". I think the hard-coded solution might not work.

Emmanuel
  • 13,935
  • 12
  • 50
  • 72
iwbabn
  • 1,275
  • 4
  • 17
  • 32
  • 1
    Possible duplicate of [Create Column with ELIF in Pandas](http://stackoverflow.com/questions/18194404/create-column-with-elif-in-pandas) – e4c5 Jan 02 '17 at 00:41
  • `indexer = {'X': ['C1', 'V1'], 'Y': ['C2', 'V2']}; pd.concat([pd.DataFrame(df.loc[df.Cond==k, v].values, columns= ['C','V']) for k,v in indexer.items()])` is one way of doing this, but it does not conserve the order of the rows. – Abdou Jan 02 '17 at 01:45

5 Answers5

4

Another option with DataFrame.where():

df[['C1', 'V1']].where(df.Cond == "X", df[['C2', 'V2']].values)

#  C1   V1
#0  1    2
#1  7    8
#2  9   10
Psidom
  • 209,562
  • 33
  • 339
  • 356
2
  • drop Cond to focus on values I'm selecting from
  • reshape numpy array so I can differentiate with a boolean value
  • index first dimension with np.arange(len(df)), once for each row
  • index second dimension with df.Cond.ne('X').mul(1). 0 for equal to X
  • construct final dataframe

pd.DataFrame(
    df.drop('Cond', 1).values.reshape(3, 2, 2)[
        np.arange(len(df)),
        df.Cond.ne('X').mul(1)
    ], df.index, ['C', 'V'])

   C   V
0  1   2
1  7   8
2  9  10
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

You can use df.loc and df.append if the order of the rows is not important.

ndf1 = df.loc[df['Cond'] == 'X', ['C1','V1']]
ndf2 = df.loc[df['Cond'] == 'Y', ['C2','V2']]
ndf1.columns = ['C','V']
ndf2.columns = ['C','V']

result = ndf1.append(ndf2).reset_index(drop=True)
print(result)
   C   V
0  1   2
1  9  10
2  7   8
Joe T. Boka
  • 6,554
  • 6
  • 29
  • 48
2

I try create more general solution with filter and numpy.where, for new column names use extract:

#if necessary sort columns
df = df.sort_index(axis=1)

#filter df by 1 and 2
df1 = df.filter(like='1')
df2 = df.filter(like='2')
print (df1)
   C1  V1
0   1   2
1   5   6
2   9  10

print (df2)
   C2  V2
0   3   4
1   7   8
2  11  12
#np.where need same shape of mask as df1 and df2
mask = pd.concat([df.Cond == 'X']*len(df1.columns), axis=1)
print (mask)
    Cond   Cond
0   True   True
1  False  False
2   True   True

cols = df1.columns.str.extract('([A-Za-z])', expand=False)
print (cols)
Index(['C', 'V'], dtype='object')

print (np.where(mask, df1,df2))
Index(['C', 'V'], dtype='object')
[[ 1  2]
 [ 7  8]
 [ 9 10]]

print (pd.DataFrame(np.where(mask, df1, df2), index=df.index, columns=cols))
   C   V
0  1   2
1  7   8
2  9  10
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can try using an approach similar to that in this post

First, define a couple of functions:

def cond(row):
    return row['Cond'] == 'X'

def helper(row, col_if, col_ifnot):
    return row[col_if] if cond(row) else row[col_ifnot]

Then, assuming your dataframe is called df,

df_new = pd.DataFrame(index=df.index)
for col in ['C', 'V']:
    col_1 = col + '1'
    col_2 = col + '2'
    df_new[col] = df.apply(lambda row: helper(row, col_1, col_2), axis=1)

Keep in mind that this approach might be slow for large dataframes, since apply doesn't take advantage of vectorization. But, it should work even with arbitrary column names (just replace ['C', 'V'] with your actual column names).

Community
  • 1
  • 1
vbox
  • 384
  • 2
  • 10