0

I have a group of functions whose values are pandas dataframe that change over time when updating the dataframes and these functions are initially stored in a dictionary which we use when needing a specific function that will return the wanted dataframe.

What I need is a way that will keep these data unchanged (fixed) across time even after the update. I tried MappingProxyType to create an immutable dictionary but it didn't help. Here's my trial

from types import MappingProxyType

def func_x():
    x = pd.DataFrame({"col1": [1,2,3], "col2": [4,5,6]})
    return x


def func_y():
    y = pd.DataFrame({"col3": [7,8,9], "col4": [10,20,30]})
    return y


def func_z():
    z = pd.DataFrame({"col5": [40,50,60], "col6": [70,80,90]})
    return z


def collect_funcs_proxy(func):
    collection_proxy = MappingProxyType(
        {"func_x": func_x, "func_y": func_y, "func_z": func_z}
    )
    return collection_proxy[func]()

I mean, when changing the values of x inside func_x, I need collect_funcs_proxy to keep the old values of func_x ignoring the new update.

Is there any way I can achieve this using MappingProxyType or something else?

Nemra Khalil
  • 69
  • 1
  • 6
  • What do you mean by _"even after the update"_? – Timus Feb 19 '22 at 21:02
  • If the data has been updated (changed), I want to keep the old version – Nemra Khalil Feb 19 '22 at 21:05
  • So you basically want to make copies of your dataframes at some points? – Thierry Lathuille Feb 19 '22 at 21:09
  • @ThierryLathuille Yes! But I want to make sure that these copies won't be affected when the original dataframes change – Nemra Khalil Feb 19 '22 at 21:12
  • can you elaborate on what you mean `change x inside func_x`. the function is a closure, and all it does is generate a dataframe. do you mean you don't want function body to change? – gold_cy Feb 19 '22 at 21:41
  • @gold_cy I mean if the dataframe that returned from `func_x` had changed, I want to keep the older version returned before an update has happened. Did you get it? – Nemra Khalil Feb 19 '22 at 21:47
  • so assign it to a variable? i don't understand the logic here honestly – gold_cy Feb 19 '22 at 21:50
  • Is [copy](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.copy.html) with the default `deep=True` not what you want? – Thierry Lathuille Feb 19 '22 at 21:53
  • @ThierryLathuille @gold_cy Let me explain more! Suppose `func_x` as-is now. If the df that inside `funx_x` is changed due to new data having been added, there will be a new df returned from `func_x` right? For testing purposes, whenever calling `func_x`, I want to return the older version of df that returned by `func_x` not the newer one that has new data and therefore my tests will be failed – Nemra Khalil Feb 19 '22 at 21:59
  • @ThierryLathuille check this link https://stackoverflow.com/questions/24928306/pandas-immutable-dataframe to get what I mean. But it didn't help too – Nemra Khalil Feb 19 '22 at 22:01
  • I don't get it. Could you give a concrete example why the setup in your question doesn't work for you? – Timus Feb 19 '22 at 23:34
  • 2
    Nemra Khalil, you mentioned new data, you mentioned change. However, the code that you show here **never** makes change to any dataframe. Why not show us? Can you add a few lines of code to show us *how* you change *which* dataframe and *what* turns out to be against your expectation? – Raymond Kwok Feb 19 '22 at 23:54
  • @RaymondKwok sorry! Let me explain more! In my daily workflow, every few months we update our database to get the newer data. Before having the new dataframe, we had an older one that we performed some tests and already passed. Now after updating all the dataframes we have, the tests will fail since there are new data that has new rows. What I need is simply to find a way that will keep the data from being updated and lose my tests results being passed. – Nemra Khalil Feb 21 '22 at 19:15
  • Thanks for the explanation. `every few months we update our database to get the newer data`. `What I need is simply to find a way that will keep the data from being updated`. You can't **do update** and **keep the data from being updated** at the same time. Perhaps when you update, you label which rows are new, and when you test, you use the labels to filter out your new data so your test always work on the old data. – Raymond Kwok Feb 21 '22 at 23:04
  • You pickle the old objects - is that an option? – Timus Feb 23 '22 at 14:33

0 Answers0