python pandas perform calculation based on condition

Question

I have some challenge in my work regarding working with pandas.

I have a pandas dataframe with columns Time, x, y, a, b.

For simplicity, the dataframe has 5 records, from which 3 records are fully filled (Time, x,y,a,b has data). Other 2 a,b are empty. Time is unidirectional

I'd like to perform a calculation on some condition on Time (lets say Time > 3) and store result on a and b (lets say the functions are a=x^2, b=x^3). The calculation of a,b shall be performed in single function (I'm using lambda function). For example

Time    x   y       a        b
0.3     0   1     2.0      3.0
1.5     4   5     6.0      7.0
2.8     8   9    10.0     11.0
3.3     8  13    None     None
4.5     3  17    None     None

Shall be converted to

Time    x   y       a        b
0.3     0   1     2.0      3.0
1.5     4   5     6.0      7.0
2.8     8   9    10.0     11.0
3.3     8  13    64.0    512.0
4.5     3  17     9.0     27.0

Any assistance would be appreciated

Notes:

number of records here are for simplicity and the code shall be general for every number of records.
need that the code will be optimized for performance.

score 3 · Accepted Answer · answered Aug 27 '23 at 13:42

3

Sure - you can use .loc with a suitable filter to select the rows and columns to work on, e.g.:

df.loc[df['x'] > 10, ['a', 'b']] = (
    df.loc[df['x'] > 10, 'x'].map(lambda x: (x ** 2, x ** 3))
)

Note how returning a tuple from the lambda and having ['a', 'b'] in the assignment side lets you do both calculations at once.

answered Aug 27 '23 at 13:42

AKX

152,115
15
115
172

My data, functions are more complicate, but this shall be more than enough for me in order to continue – user1977050 Aug 27 '23 at 15:07
Which pandas version did you use? `map` doesn't allow unpacking in the latest version. Also constructing a Series of tuples with apply/map is not vectorized. – mozway Aug 27 '23 at 16:02
@mozway pandas version 1.3.5 – user1977050 Aug 28 '23 at 05:11
@user1977050 my question was originally for AKX. That said, this is a very old version, you should consider updating ;) Keep in mind that this solution is both inefficient and doesn't work on a recent pandas. You said in the question that you needed an efficient approach, `map` is not. – mozway Aug 28 '23 at 05:19

mozway · Answer 2 · 2023-08-28T11:46:29.567

0

Use simple boolean indexing and vectorial code:

m = df['Time'].gt(3)
x = df.loc[m, 'x']

df.loc[m, 'a'] = x**2
df.loc[m, 'b'] = x**3

Output:

   Time  x   y     a      b
0   0.3  0   1   2.0    3.0
1   1.5  4   5   6.0    7.0
2   2.8  8   9  10.0   11.0
3   3.3  8  13  64.0  512.0
4   4.5  3  17   9.0   27.0

Comparison of efficiency with a map (without assignment as this raises a TypeError anyway on pandas 2.0.1):

edited Aug 28 '23 at 11:46

answered Aug 27 '23 at 14:27

mozway

194,879
13
39
75

The example is simplified. In my case, the values of `a,b` are calculated based on some other values of the data frame, which are not related to x and may be based on object values (i.e. if `c='LOSS' then b='TEST'`) – user1977050 Aug 28 '23 at 10:22
@user1977050 can you provide an example? There are pandas methods to do that efficiently (i.e without a loop/map/apply) – mozway Aug 28 '23 at 11:07
To be more specific: data frame contains time stamp. I'd like to find all records where timestamp is greater than X. On those records, apply function to some columns which will give a result to other column(s). (In the example above, x is for time stamp) – user1977050 Aug 28 '23 at 11:14
Can you [edit](https://stackoverflow.com/posts/76987365/edit) your question with a reproducible example? – mozway Aug 28 '23 at 11:23
Question has been updated – user1977050 Aug 28 '23 at 11:26
But you didn't provide a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Aug 28 '23 at 11:28
Try now :) :) :) – user1977050 Aug 28 '23 at 11:38
@user1977050 I updated the code, see it's almost the same, just the condition changes. You can use a condition on multiple columns if needed. No need for `map` or `apply` that are slow. – mozway Aug 28 '23 at 11:47
im guessing my answer probably slower than yours, but just thought i would throw another solution in the mix :) – rhug123 Aug 28 '23 at 19:03

score -1 · Answer 3 · answered Aug 28 '23 at 13:33

-1

Here is a way:

df.fillna(df[['x','x']].pow([2,3]).set_axis(list('ab'),axis=1).where(df['Time'].gt(3)))

Output:

   Time  x   y     a      b
0   0.3  0   1   2.0    3.0
1   1.5  4   5   6.0    7.0
2   2.8  8   9  10.0   11.0
3   3.3  8  13  64.0  512.0
4   4.5  3  17   9.0   27.0

answered Aug 28 '23 at 13:33

rhug123

7,893
1
9
24

even though the code is performing the scenario, it is very specific and not general – user1977050 Aug 29 '23 at 05:40
i think it should be able to work for different situations. – rhug123 Aug 29 '23 at 13:19
the solution to use `pow` is very specific. if I need to use more complicated function? – user1977050 Aug 31 '23 at 07:50

python pandas perform calculation based on condition

3 Answers3