1

I have the following pandas dataframe:

     A       B
0   16.3    1.10
1   23.2    1.33
2   10.7   -0.43
3   5.7    -2.01
4   5.4    -1.86
5   23.5    3.14

What I would like to accomplish is to generate a third column by comparing the values in column A in 2 adjacent rows and do the same for the next 2, and so on.

It might be a little confusing so I will give an example:

  • For the first row, if 16.3 - 23.2 < 5 then the value of the first row of the new column C should be Bad, if that difference is ==5 then the new column should be Decent and Good if the difference is >5
  • For the second row, apply the same kind of logic but use 23.2 - 16.3 instead and generate the value of C for that row from that difference
  • Do the same for the rows 3 and 4 as a pair, 5 and 6 as a pair and so on

So the resulting dataframe should be as follows:

     A       B      C
0   16.3    1.10   Bad
1   23.2    1.33   Good
2   10.7   -0.43   Decent
3   5.7    -2.01   Bad
4   5.4    -1.86   Bad
5   23.5    3.14   Good

I have looked around a bit and found that you can define a function which returns different states and then use df.apply.

So I thought maybe it could be possible to create 2 functions: one for the odd rows that compares the value of A to the next row, and another for the even rows that compares it to the previous row.

However I can't wrap my head around how it would be possible to apply both the functions together to generate the column C.

How could I implement that, or if there is an easier solution, how could it be done?

Sd Junk
  • 272
  • 3
  • 15

1 Answers1

1

You can use numpy.select with numpy.isclose for comparing floats because precision, for swapped values create helper DataFrame.

Notice:
Solution working for pair number of rows.

print (df)
      A      B
0  16.3   1.10
1  23.2   1.33
2  10.7  -0.43
3   5.7  -2.01
4   5.4  -1.86
5  23.5   3.14
6  11.7   4.00
7  24.9  10.00

#create default Rangeindex
df = df.reset_index(drop=True) 

#MultiIndex by integer and modulo division with reshape
df1 = df.set_index([df.index // 2, df.index % 2]).unstack()
#subtract first values with second in MultiIndex
df1 = df1.xs(0, axis=1, level=1) - df1.xs(1, axis=1, level=1)
#join together with multiplied df by -1
df1 = pd.concat([df1, df1 * -1]).sort_index().reset_index(drop=True)
print (df1)
      A     B
0  -6.9 -0.23
1   6.9  0.23
2   5.0  1.58
3  -5.0 -1.58
4 -18.1 -5.00
5  18.1  5.00
6 -13.2 -6.00
7  13.2  6.00

masks = [np.isclose(df1, 5), df1.values < 5]
vals = ['Decent','Bad']

#create new df and join to original
df = df.join(pd.DataFrame(np.select(masks, vals, 'Good'), columns=df.columns).add_suffix('_new'))
print (df)
      A      B   A_new   B_new
0  16.3   1.10     Bad     Bad
1  23.2   1.33    Good     Bad
2  10.7  -0.43  Decent     Bad
3   5.7  -2.01     Bad     Bad
4   5.4  -1.86     Bad     Bad
5  23.5   3.14    Good  Decent
6  11.7   4.00     Bad     Bad
7  24.9  10.00    Good    Good
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your answer. I have tried to run it and it works up until more rows are added to the column `A`. If I add two more rows with the values `24.9` and `11.7` for example, then the 6th row of `C` becomes `Bad` which it shouldn't. It should still say `Good`. – Sd Junk Dec 04 '18 at 10:57
  • 1
    Thank you. This is exactly what I was looking for. It's working like a charm. – Sd Junk Dec 05 '18 at 20:02