2

In my mind, pandas is providing me with a virtual spreadsheet, like Excel. One thing about Excel spreadsheets is that you can set a column to a function. For instance

          T_c    T    T_r
Series 1    1    ?    T/T_c
Series 2    2    ?    T/T_c

Is there any way to create a DataFrame such that the column T_r tracks any changes made to column T? In other words, any time T gets updated, T_r gets updated too.

RodericDay
  • 1,266
  • 4
  • 20
  • 35
  • Don't think such a thing exists in Pandas. basically you just have to have function and should call it all the time you need values – Roman Pekar Nov 14 '13 at 20:55
  • see this question, pretty similar: http://stackoverflow.com/questions/18024742/how-to-store-formulas-instead-of-values-in-pandas-dataframe – Jeff Nov 14 '13 at 20:55
  • that's... yeah, it's gonna be easier to just have update functions – RodericDay Nov 14 '13 at 21:09
  • So this is like an observer of the columns... would be interesting to see usecase, maybe there is another way to do it – Andy Hayden Nov 14 '13 at 21:34
  • @AndyHayden I think it's something like computed columns in SQL view, would be good to have. For example, columns like this could be created like `df['comp'] = df.computed(lambda x: x['T'] / x['T_c'])`, may be also axis parameter (?) and then calculated every time when called – Roman Pekar Nov 15 '13 at 06:52
  • @RomanPekar Could add as property but is unstatisfying, there's definitely an open issue about this.. – Andy Hayden Nov 15 '13 at 07:29
  • Subclassing and using properties is something I may try after I get the actual code working. – RodericDay Nov 15 '13 at 20:15

1 Answers1

4

[Answer constructed from comments above as I came here looking for it myself.]

In the current version of pandas there is no such way that I know of.

To achieve the same thing with a bit of bookkeeping, you could have a function to create the computed column

def update_computed_column(df):
    df['c'] = df['a'] / df['b']

and then call it whenever you are interested in checking the value.

Alternatively, wrap the DataFrame in a class with a getter.

class WrappedDataFrame:
    def __init__(self, df):
        self._df = df
        self._update_computed_columns()

    def _update_computed_columns(self):
        # Define all your computed columns
        self._df['c'] = self._df['a'] / self._df['b']

    @property
    def df(self):
        self._update_computed_columns()
        return self._df

So then modifying the data will automatically recompute the columns.

>>> a = WrappedDataFrame(DataFrame.from_dict({'a': [1, 2, 3], 'b': [4, 5, 6]}))
>>> print(a.df)
   a  b     c
0  1  4  0.25
1  2  5  0.40
2  3  6  0.50
>>> a.df['a'] = [7, 8, 9]
>>> print(a.df)
   a  b     c
0  7  4  1.75
1  8  5  1.60
2  9  6  1.50

This could be augmented with methods to add new computed columns, storing their formulae as functions in a private dictionary, etc.

Alternatively, you could subclass DataFrame and integrate the computation directly --- depends on your purpose.

Cai
  • 1,726
  • 2
  • 15
  • 24