0

I would like to emulate an Excel formula in Pandas I've tried this:

df = pd.DataFrame({'a': [3, 2, 1, 0], 'b': [5, 3, 2, 1]})
df['c'] = lambda x : df.a + df.b + 1 # Displays <function <lambda> ..> instead of the result
df['d'] = df.a + df.b + 1 # Static computation
df.a *= 2
df # Result of column c and d not updated :(
   a  b                                      c  d
0  6  5  <function <lambda> at 0x7f2354ddcca0>  9
1  4  3  <function <lambda> at 0x7f2354ddcca0>  6
2  2  2  <function <lambda> at 0x7f2354ddcca0>  4
3  0  1  <function <lambda> at 0x7f2354ddcca0>  2

What I expect is:

df
   a  b  c
0  6  5  12
1  4  3  8
2  2  2  5
3  0  1  2
df.a /= 2
   a  b  c
0  3  5  9
1  2  3  6
2  1  2  4
3  0  1  2

Is this possible to have a computed column dynamically in Pandas?

nowox
  • 25,978
  • 39
  • 143
  • 293
  • 1
    @anky even with `apply` the column is not refreshed and column `a` or `b` is updated – nowox Feb 08 '21 at 19:17
  • the function `lambda` is sitting there idle. You need to fire it `df['c'][0]()` but every lambda is calculating whole `df.a` + `df.b`, I think you want row wise sum which is more reasonable – Epsi95 Feb 08 '21 at 19:18
  • @Epsi95 In this example I am doing a sum but in my real case I would like something more complex – nowox Feb 08 '21 at 19:19
  • can you add what you expect to see in the result dataset please – DanCor Feb 08 '21 at 19:20
  • @DanCor I've added an example – nowox Feb 08 '21 at 19:22
  • 1
    Oh okay, I think you are looking for some sort of `hook` which will automatically update a column, but I never saw such thing in pandas. Maybe others can tell – Epsi95 Feb 08 '21 at 19:22
  • 2
    I don't think dynamically updating a/some columns of a dataframe is supported by Pandas. You may need to write a wrapper. – Quang Hoang Feb 08 '21 at 19:23

2 Answers2

0

Maybe this code might give you a step in the right direction:

import pandas as pd
c_list =[]
df = pd.DataFrame({'a': [3, 2, 1, 0], 'b': [5, 3, 2, 1]})
c_list2 = list(map(lambda x: x + df.b + 1 , list(df.a)))

for i in range (0,4):
    c_list.append(pd.DataFrame(c_list2[i])["b"][i])

df['c'] = c_list
df['d'] = df.a + df.b # Static computation
df.a *= 2
df 
0

Reactivity between columns in a DataFrame does not seem practically feasible. My cellopype package does give you Excel-like reactivity between DataFrames. Here's my take on your question:

pip install cellopype
import pandas as pd
from cellopype import Cell

# define source df and wrap it in a Cell:
df_ab = pd.DataFrame({'a': [3, 2, 1, 0], 'b': [5, 3, 2, 1]})
cell_ab = Cell(recalc=lambda: df_ab.copy())

# define the dependent/reactive Cell (with a single column 'c')
cell_c = Cell(
   recalc=lambda df: pd.DataFrame(df.a + df.b, columns=['c']), 
   sources=[cell_ab]
)
# and get its value
print(cell_c.value)
   c
0  8
1  5
2  3
3  1

# change source df and recalc its Cell...
df_ab.loc[0,'a']=100
cell_ab.recalc()

# cell_c has been updated in response
print(cell_c.value)
     c
0  105
1    5
2    3
3    1

Also see my response to this question.

kleynjan
  • 108
  • 5