0

I have the following dataframe:

+----+----+---------+----------+
|    |A   |   B     |   C      |
|----+----+---------+----------|
|  0 |S   |   nan   |   5464.5 |
|  1 |A   |  5464.5 |   5464.5 |
|  2 |A   |  5675.5 |    nan   |
|  3 |S   |  5670   |    nan   |
|  4 |A   |  5664   |    nan   |
|  5 |B   |  5635.5 |    nan   |
|  6 |D   |  5624   |    nan   |
|  7 |C   |  5624   |    nan   |
|  8 |X   |   nan   |   5464.5 |
|  9 |C   |  5715.5 |    nan   |
| 10 |D   |  4704   |   5000   |
+----+----+---------+----------+

I want to replace the nan values in B and values in B < 5000 with a condition: if column A is 'S' it should be replaced with Column C if column A is 'X' it should be replaced with column C+10 if column A is 'D' it should be replaced with column C-10

so the result should look like this:

+----+----+---------+----------+
|    |A   |   B     |   C      |
|----+----+---------+----------|
|  0 |S   |  5464.5 |   5464.5 |
|  1 |A   |  5464.5 |   5464.5 |
|  2 |A   |  5675.5 |    nan   |
|  3 |S   |  5670   |    nan   |
|  4 |A   |  5664   |    nan   |
|  5 |B   |  5635.5 |    nan   |
|  6 |D   |  5624   |    nan   |
|  7 |C   |  5624   |    nan   |
|  8 |X   |  5474.5 |   5464.5 |
|  9 |C   |  5715.5 |    nan   |
| 10 |D   |  4704   |   4990   |
+----+----+---------+----------+

What is the most elegant, simple and readable way to solve this. I tend to loop over the table and change it, but i can find a lot of warnings not to do this.

Egirus Ornila
  • 1,234
  • 4
  • 14
  • 39

2 Answers2

1

you can use & or | operator to combine multiple conditions, and then use .loc method to select the data filtered by the condition.

And two matters need attention: 1. use .values to get the values assigned to the dataframe. 2. do not forget the parentheses when using == operator.

index = df['B'].isna()|(df['B']<5000)

index_1 = index & (df['A']=='S')
df.loc[index_1, 'B'] = df.loc[index_1, 'C'].values

index_2 = index & (df['A']=='X')
df.loc[index_2, 'B'] = (df.loc[index_2, 'C']+10).values

index_3 = index & (df['A']=='D')
df.loc[index_3, 'B'] = (df.loc[index_3, 'C']-10).values
Woods Chen
  • 574
  • 3
  • 13
  • I think in the third and fourth expressions df.loc[index_1, 'C'] should be replaced with df.loc[index_2, 'C'] and df.loc[index_3, 'C'], respectively. – Sam Jun 18 '19 at 11:51
0

I don't know how elegant it is, but it works:

df2 = df.copy()

df2.B[((df2.A=='S')&(df2.B.isna()))|((df2.A=='S')&(df2.B<5000))]=df2.C
df2.B[((df2.A=='X')&(df2.B.isna()))|((df2.A=='X')&(df2.B<5000))]=df2.C+10
df2.B[((df2.A=='D')&(df2.B.isna()))|((df2.A=='D')&(df2.B<5000))]=df2.C-10

Also, make sure you work on a copy of your initial DataFrame, just in case as shown in this answer.

Sam
  • 110
  • 1
  • 1
  • 8
  • That is not true - the answer works just fine and does exactly what Egirus has been asking for. – Sam May 09 '19 at 14:11