automatically updating columns in pandas?

Question

In my mind, pandas is providing me with a virtual spreadsheet, like Excel. One thing about Excel spreadsheets is that you can set a column to a function. For instance

          T_c    T    T_r
Series 1    1    ?    T/T_c
Series 2    2    ?    T/T_c

Is there any way to create a DataFrame such that the column T_r tracks any changes made to column T? In other words, any time T gets updated, T_r gets updated too.

Don't think such a thing exists in Pandas. basically you just have to have function and should call it all the time you need values — Roman Pekar, Nov 14 '13 at 20:55
see this question, pretty similar: http://stackoverflow.com/questions/18024742/how-to-store-formulas-instead-of-values-in-pandas-dataframe — Jeff, Nov 14 '13 at 20:55
that's... yeah, it's gonna be easier to just have update functions — RodericDay, Nov 14 '13 at 21:09
So this is like an observer of the columns... would be interesting to see usecase, maybe there is another way to do it — Andy Hayden, Nov 14 '13 at 21:34
@AndyHayden I think it's something like computed columns in SQL view, would be good to have. For example, columns like this could be created like `df['comp'] = df.computed(lambda x: x['T'] / x['T_c'])`, may be also axis parameter (?) and then calculated every time when called — Roman Pekar, Nov 15 '13 at 06:52
@RomanPekar Could add as property but is unstatisfying, there's definitely an open issue about this.. — Andy Hayden, Nov 15 '13 at 07:29
Subclassing and using properties is something I may try after I get the actual code working. — RodericDay, Nov 15 '13 at 20:15

score 4 · Answer 1 · answered Dec 19 '19 at 15:51

[Answer constructed from comments above as I came here looking for it myself.]

In the current version of pandas there is no such way that I know of.

To achieve the same thing with a bit of bookkeeping, you could have a function to create the computed column

def update_computed_column(df):
    df['c'] = df['a'] / df['b']

and then call it whenever you are interested in checking the value.

Alternatively, wrap the DataFrame in a class with a getter.

class WrappedDataFrame:
    def __init__(self, df):
        self._df = df
        self._update_computed_columns()

    def _update_computed_columns(self):
        # Define all your computed columns
        self._df['c'] = self._df['a'] / self._df['b']

    @property
    def df(self):
        self._update_computed_columns()
        return self._df

So then modifying the data will automatically recompute the columns.

>>> a = WrappedDataFrame(DataFrame.from_dict({'a': [1, 2, 3], 'b': [4, 5, 6]}))
>>> print(a.df)
   a  b     c
0  1  4  0.25
1  2  5  0.40
2  3  6  0.50
>>> a.df['a'] = [7, 8, 9]
>>> print(a.df)
   a  b     c
0  7  4  1.75
1  8  5  1.60
2  9  6  1.50

This could be augmented with methods to add new computed columns, storing their formulae as functions in a private dictionary, etc.

Alternatively, you could subclass DataFrame and integrate the computation directly --- depends on your purpose.

automatically updating columns in pandas?

1 Answers1

Linked

Related