28

I am interested in an Immutable DataFrame to use in a program as a reference table, with read_only properties enforced, after it has been initially constructed (which in my case is during a class def __init__() method).

I see Index Objects are Frozen.

Is there a way to make an entire DataFrame immutable?

sanguineturtle
  • 1,425
  • 2
  • 15
  • 29
  • 1
    question is almost like how to make a list immutable. These are issues at the root of the design of the data structure and can probably not be changed. if you do want to enforce this so that the pointer to the data-frame cannot accidentally change it, you should probably create your own getter and setter for it, i.e. put df in self._df and then create a getter that always returns a copy of the df and not a pointer to the original. you can still get to the original and change it though by using self._df, but if will provide an additional layer of abstraction. – Joop Jul 24 '14 at 08:45

4 Answers4

20

The StaticFrame package (of which I am an author) implements a Pandas-like interface, and many common Pandas operations, while enforcing immutability in underlying NumPy arrays and immutable Series and Frame containers.

You can make an entire Pandas DataFrame immutable by converting it to a StaticFrame Frame with static_frame.Frame.from_pandas(df). Then you can use it as a truly read-only table.

See StaticFrame documentation of this method: https://static-frame.readthedocs.io/en/latest/api_detail/frame.html#frame-constructor

flexatone
  • 345
  • 3
  • 5
  • 2
    Excelent work! At what level is the immutability enforced? I am asking to know where we can expect overhead, specially concerning using StaticFrames to provide numpy arrays for sklearn without copying. – dawid Sep 23 '20 at 08:43
  • 5
    Thank you! Immutability is enforced at the NumPy-array level: as NumPy arrays implement the Buffer Protocol (PEP 3118), we can set the `array.flags.writeable` Boolean attribute to enforce immutability. I expect this adds no additional overhead compared to mutable arrays. StaticFrame leverages immutable arrays to make many operations on `Series` and `Frame` fast and lightweight, as no copying of underlying arrays is necessary. This terminal animation demonstrates this benefit: https://raw.githubusercontent.com/InvestmentSystems/static-frame/master/doc/images/animate-low-memory-ops-verbose.svg – flexatone Sep 24 '20 at 15:35
13

Try code something like this

class Bla(object):
    def __init__(self):
        self._df = pd.DataFrame(index=[1,2,3])

    @property
    def df(self):
        return self._df.copy()

this will allow you to get the df back, using b.df, but you will not be able to assign to it. So in short you have a df in class that behaves in the "Immutable DataFrame", purely in that it blocks changes to the original. the returned object is however still a mutable data frame so it will not behave like an Immutable one in other ways. I.e. you will not be able to use it as key for dictionary, etc.

Joop
  • 7,840
  • 9
  • 43
  • 58
  • @sanguineturtle See you already asked a very similar question that was answered in a similar fashion! – Joop Jul 24 '14 at 09:05
  • 1
    The other question is concerning a read only attribute to a python class. I was hoping this question would focus more on immutable dataframes to protect that data within the class. That way I could have an unchangeable ``raw_data`` dataframe internally and a class attribute that is not updateable from outside the class. Great idea re: returning a copy. – sanguineturtle Jul 25 '14 at 03:53
  • While making a copy is expensive, this is pretty clever. – lnNoam Mar 14 '17 at 00:53
  • and I am under the impression that we could still `self.df.append()` the dataframe, which actually changes the `df` without even assigning to it? – stucash Jul 28 '21 at 04:33
5

If you truely want to make the DataFrame behave as immutable instead of using the copy solution by @Joop (which I would recommend) you could build upon the following structure.

Note that it is just a starting point.

It basically is a proxy data object that hides all things that would change the state and allows itself to be hashed and all instances of the same original data will have the same hash. There probably are modules that does the below in cooler ways, but I figured it could be educational as an example.

Some warnings:

  • Dependeing on how the string representation of the proxied object is constructed two different proxied objects could get the same hash, howerver the implementation is compatible with DataFrames among other objects.

  • Changes to the original object, will affect the proxy object.

  • Equalness will lead to some nasty inifinite reqursions if the other object tosses the equalness question back (this is why list has a special case).

  • The DataFrame proxy maker helper is just a start, the problem is that any method that changes the state of the original object cannot be allowed or needs to be manually overwritten by the helper or entirely masked in by the extraFilter-parameter when instantiating _ReadOnly. See the DataFrameProxy.sort.

  • The proxys won't show as derived from the proxied's type.

The Generic Read Only Proxy

This could be used on any object.

import md5                                                                                              
import warnings                                                                                         

class _ReadOnly(object):                                                                                

    def __init__(self, obj, extraFilter=tuple()):                                                       

        self.__dict__['_obj'] = obj                                                                     
        self.__dict__['_d'] = None                                                                      
        self.__dict__['_extraFilter'] = extraFilter                                                     
        self.__dict__['_hash'] = int(md5.md5(str(obj)).hexdigest(), 16)                                 

    @staticmethod                                                                                       
    def _cloak(obj):                                                                                    
        try:                                                                                            
            hash(obj)                                                                                   
            return obj                                                                                  
        except TypeError:                                                                               
            return _ReadOnly(obj)                                                                       

    def __getitem__(self, value):                                                                       

        return _ReadOnly._cloak(self._obj[value])                                                       

    def __setitem__(self, key, value):                                                                  

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __delitem__(self, key):                                                                         

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __getattr__(self, value):                                                                       

        if value in self.__dir__():                                                                     
            return _ReadOnly._cloak(getattr(self._obj, value))                                          
        elif value in dir(self._obj):                                                                   
            raise AttributeError("{0} attribute {1} is cloaked".format(                                 
                type(self._obj), value))                                                                
        else:                                                                                           
            raise AttributeError("{0} has no {1}".format(                                               
                type(self._obj), value))                                                                

    def __setattr__(self, key, value):                                                                  

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __delattr__(self, key):                                                                         

        raise TypeError(                                                                                
            "{0} has a _ReadOnly proxy around it".format(type(self._obj)))                              

    def __dir__(self):                                                                                  

        if self._d is None:                                                                             
            self.__dict__['_d'] = [                                                                     
                i for i in dir(self._obj) if not i.startswith('set')                                    
                and i not in self._extraFilter]                                                         
        return self._d                                                                                  

    def __repr__(self):                                                                                 

        return self._obj.__repr__()                                                                     

    def __call__(self, *args, **kwargs):                                                                

        if hasattr(self._obj, "__call__"):                                                              
            return self._obj(*args, **kwargs)                                                           
        else:                                                                                           
            raise TypeError("{0} not callable".format(type(self._obj)))                                 

    def __hash__(self):                                                                                 

        return self._hash                                                                               

    def __eq__(self, other):                                                                            

        try:                                                                                            
            return hash(self) == hash(other)                                                            
        except TypeError:                                                                               
            if isinstance(other, list):                                                                 
                try:                                                                                    
                    return all(zip(self, other))                                                        
                except:                                                                                 
                    return False                                                                        
            return other == self    

The DataFrame proxy

Should really be extended with more methods like sort and filtering all other state-changing methods of non-interest.

You can either instantiate with a DataFrame-instance as the only argument or give it the arguments as you would have to create a DataFrame

import pandas as pd

class DataFrameProxy(_ReadOnly):                                                                        

    EXTRA_FILTER = ('drop', 'drop_duplicates', 'dropna')                                                

    def __init__(self, *args, **kwargs):                                                                

        if (len(args) == 1 and                                                                          
                not len(kwargs) and                                                                     
                isinstance(args, pd.DataFrame)):                                                        

            super(DataFrameProxy, self).__init__(args[0],                                               
                DataFrameProxy.EXTRA_FILTER)                                                            

        else:                                                                                           

            super(DataFrameProxy, self).__init__(pd.DataFrame(*args, **kwargs),                         
                DataFrameProxy.EXTRA_FILTER)                                                            



    def sort(self, inplace=False, *args, **kwargs):                                                     

        if inplace:                                                                                     
            warnings.warn("Inplace sorting overridden")                                                 

        return self._obj.sort(*args, **kwargs) 

Finally:

However, though fun making this contraption, why not simply have a DataFrame that you don't alter? If it is only exposed to you, better just you making sure not to alter it...

deinonychusaur
  • 7,094
  • 3
  • 30
  • 44
  • There are some really neat and interesting ideas here. re: Finally - Other will be contributing to this class in a collaborative environment and I would like to make sure other methods to use the Private ``__raw_data`` attribute and use ``_dataset`` instead for making alterations. – sanguineturtle Jul 25 '14 at 03:51
  • 1
    The truely hard part is the fact the original `DataFrame` can return objects that are *views* of the data, that may have methods that reach and affect the state of the original `DataFrame`. So depending on how little trust you want to put in your collaborators, it might be worth having a type specific-lookup in the `_ReadOnly._cloak` (put it on module level instead) that gives specific proxies per type as far as possible. Also, maybe add an `.to_mutable` function so people can retreive a copy of the `_obj` – deinonychusaur Jul 25 '14 at 06:46
  • 2
    nice work. I've wished I could have an immutable DataFrame and this a good start. I'd add self._obj.values.flags.writeable=False to the mix and perhaps overriding __settiem__ on any locator returned (e.g. df.iloc[0.0]=999) . Probably can't control mutability but you have a good base – Phil Cooper Oct 25 '15 at 15:50
5

By looking into pandas implementation and utilizing pandas capabilities, one can patch the DataFrame object to achieve this behavior. I implemented a method named make_dataframe_immutable(dataframe) to solve this problem. Written for pandas==0.25.3,

EDIT: added a solution for pandas==1.0.5 and pandas==1.1.4

New pandas version probably requires adjustments - hope it won't be too hard to do it, by utilizing the tests below.

This solution is new and not thoroughly tested - every feedback will be appreciated.

It would be nice if someone could post here an inverse make_dataframe_mutable() method.

import functools

import numpy as np
import pandas as pd
from pandas.core.indexing import _NDFrameIndexer


def make_dataframe_immutable(df: pd.DataFrame):
    """
    Makes the given DataFrame immutable.
    I.e. after calling this method - one cannot modify the dataframe using pandas interface.

    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(df, "_is_immutable", False):
        return
    df._is_immutable = True
    df._set_value = functools.wraps(df._set_value)(_raise_immutable_exception)
    df._setitem_slice = functools.wraps(df._setitem_slice)(_raise_immutable_exception)
    df._setitem_frame = functools.wraps(df._setitem_frame)(_raise_immutable_exception)
    df._setitem_array = functools.wraps(df._setitem_array)(_raise_immutable_exception)
    df._set_item = functools.wraps(df._set_item)(_raise_immutable_exception)
    df._data.delete = functools.wraps(df._data.delete)(_raise_immutable_exception)
    df.update = functools.wraps(df.update)(_raise_immutable_exception)
    df.insert = functools.wraps(df.insert)(_raise_immutable_exception)

    df._get_item_cache = _make_result_immutable(df._get_item_cache)

    # prevent modification through numpy arrays
    df._data.as_array = _make_numpy_result_readonly(df._data.as_array)

    _prevent_inplace_argument_in_function_calls(
        df,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.DataFrame, attr, None), '__code__') and
        #  'inplace' in getattr(pd.DataFrame, attr).__code__.co_varnames]
        (
            'bfill',
            'clip',
            'clip_lower',
            'clip_upper',
            'drop',
            'drop_duplicates',
            'dropna',
            'eval',
            'ffill',
            'fillna',
            'interpolate',
            'mask',
            'query',
            'replace',
            'reset_index',
            'set_axis',
            'set_index',
            'sort_index',
            'sort_values',
            'where',
            "astype",
            "assign",
            "reindex",
            "rename",
        ),
    )


def make_series_immutable(series: pd.Series):
    """
    Makes the given Series immutable.
    I.e. after calling this method - one cannot modify the series using pandas interface.


    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(series, "_is_immutable", False):
        return
    series._is_immutable = True
    series._set_with_engine = functools.wraps(series._set_with_engine)(_raise_immutable_exception)
    series._set_with = functools.wraps(series._set_with)(_raise_immutable_exception)
    series.set_value = functools.wraps(series.set_value)(_raise_immutable_exception)

    # prevent modification through numpy arrays
    series._data.external_values = _make_numpy_result_readonly(series._data.external_values)
    series._data.internal_values = _make_numpy_result_readonly(series._data.internal_values)
    series._data.get_values = _make_numpy_result_readonly(series._data.get_values)

    _prevent_inplace_argument_in_function_calls(
        series,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.Series, attr, None), '__code__') and
        #  'inplace' in getattr(pd.Series, attr).__code__.co_varnames]
        (
            "astype",
            'bfill',
            'clip',
            'clip_lower',
            'clip_upper',
            'drop',
            'drop_duplicates',
            'dropna',
            'ffill',
            'fillna',
            'interpolate',
            'mask',
            'replace',
            'reset_index',
            'set_axis',
            'sort_index',
            'sort_values',
            "valid",
            'where',
            "_set_name",
        ),
    )


class ImmutablePandas(Exception):
    pass


def _raise_immutable_exception(*args, **kwargs):
    raise ImmutablePandas(f"Cannot modify immutable dataframe. Please use df.copy()")


def _get_df_or_series_from_args(args):
    if len(args) >= 2 and (isinstance(args[1], pd.DataFrame) or isinstance(args[1], pd.Series)):
        return args[1]


def _safe__init__(self, *args, **kwargs):
    super(_NDFrameIndexer, self).__init__(*args, **kwargs)
    df_or_series = _get_df_or_series_from_args(args)
    if df_or_series is not None:
        if getattr(df_or_series, "_is_immutable", False):
            self._get_setitem_indexer = functools.wraps(self._get_setitem_indexer)(_raise_immutable_exception)


# This line is the greatest foul in this module - as it performs a global patch.
# Notice that a reload of this module incurs overriding this variable again and again. It is supported.
_NDFrameIndexer.__init__ = functools.wraps(_NDFrameIndexer.__init__)(_safe__init__)


def _make_numpy_result_readonly(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, np.ndarray):
            res.flags.writeable = False
        return res

    return wrapper


def _make_result_immutable(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, pd.Series):
            make_series_immutable(res)
        return res

    return wrapper


def _prevent_inplace_operation(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # TODO: here we assume that in-place is not given as a positional.
        #  remove this assumption, either by hard-coding the position for each method or by parsing the
        #  function signature.
        if kwargs.get("inplace", False):
            _raise_immutable_exception()
        return func(*args, **kwargs)

    return wrapper


def _prevent_inplace_argument_in_function_calls(obj, attributes):
    for attr in attributes:
        member = getattr(obj, attr)
        setattr(obj, attr, _prevent_inplace_operation(member))


pytest unit-tests

import immutable_pandas
import importlib
import warnings

import pandas as pd
import pytest



def create_immutable_dataframe() -> pd.DataFrame:
    # Cannot be used as a fixture because pytest copies objects transparently, which makes the tests flaky
    immutable_dataframe = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
    make_dataframe_immutable(immutable_dataframe)
    return immutable_dataframe


def test_immutable_dataframe_cannot_change_with_direct_access():
    immutable_dataframe = create_immutable_dataframe()
    immutable_dataframe2 = immutable_dataframe.query("x == 2")
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        immutable_dataframe2["moshe"] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.x = 2
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["moshe"] = 56
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.insert(0, "z", [1, 2, 3, 4])


def test_immutable_dataframe_cannot_change_with_inplace_operations():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.eval("y=x+1", inplace=True)
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.assign(y=2, inplace=True)


def test_immutable_dataframe_cannot_change_with_loc():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.loc[2] = 1
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.iloc[1] = 4


def test_immutable_dataframe_cannot_change_with_columns_access():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"][2] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"].loc[2] = 123


def test_immutable_dataframe_cannot_del_column():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        del immutable_dataframe["x"]


def test_immutable_dataframe_cannot_be_modified_through_values():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.values[0, 0] = 1
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.as_matrix()[0, 0] = 1


def test_immutable_series_cannot_change_with_loc():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.loc[0] = 1
    with pytest.raises(ImmutablePandas):
        series.iloc[0] = 1


def test_immutable_series_cannot_change_with_inplace_operations():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.sort_index(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.sort_values(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.astype(int, inplace=True)


def test_series_cannot_be_modeified_through_values():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.get_values()[0] = 1234
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.values[0] = 1234


def test_reloading_module_immutable_pandas_does_not_break_immutability():
    # We need to test the effects of reloading the module, because we modify the global variable
    #       _NDFrameIndexer.__init__ upon every reload of the module.
    df = create_immutable_dataframe()
    df2 = df.copy()
    immutable_pandas2 = importlib.reload(immutable_pandas)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df.astype(int, inplace=True)
    df2.astype(int, inplace=True)
    immutable_pandas2.make_dataframe_immutable(df2)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df2.astype(int, inplace=True)


Edit: here is an update tested on pandas==1.0.5 and on pandas==1.1.4

"""
Two methods to make pandas objects immutable.
    make_dataframe_immutable()
    make_series_immutable()
"""
import functools

import numpy as np
import pandas as pd
from pandas.core.indexing import _iLocIndexer
from pandas.core.indexing import _LocIndexer
from pandas.core.indexing import IndexingMixin


def make_dataframe_immutable(df: pd.DataFrame):
    """
    Makes the given DataFrame immutable.
    I.e. after calling this method - one cannot modify the dataframe using pandas interface.

    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(df, "_is_immutable", False):
        return
    df._is_immutable = True
    df._set_value = functools.wraps(df._set_value)(_raise_immutable_exception)
    df._setitem_slice = functools.wraps(df._setitem_slice)(_raise_immutable_exception)
    df._setitem_frame = functools.wraps(df._setitem_frame)(_raise_immutable_exception)
    df._setitem_array = functools.wraps(df._setitem_array)(_raise_immutable_exception)
    df._set_item = functools.wraps(df._set_item)(_raise_immutable_exception)
    if hasattr(df, "_mgr"):
        # pandas==1.1.4
        df._mgr.idelete = functools.wraps(df._mgr.idelete)(_raise_immutable_exception)
    elif hasattr(df, "_data"):
        # pandas==1.0.5
        df._data.delete = functools.wraps(df._data.delete)(_raise_immutable_exception)
    df.update = functools.wraps(df.update)(_raise_immutable_exception)
    df.insert = functools.wraps(df.insert)(_raise_immutable_exception)

    df._get_item_cache = _make_result_immutable(df._get_item_cache)

    # prevent modification through numpy arrays
    df._data.as_array = _make_numpy_result_readonly(df._data.as_array)

    _prevent_inplace_argument_in_function_calls(
        df,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.DataFrame, attr, None), '__code__') and
        #  'inplace' in getattr(pd.DataFrame, attr).__code__.co_varnames]
        (
            "bfill",
            "clip",
            "drop",
            "drop_duplicates",
            "dropna",
            "eval",
            "ffill",
            "fillna",
            "interpolate",
            "mask",
            "query",
            "replace",
            "reset_index",
            "set_axis",
            "set_index",
            "sort_index",
            "sort_values",
            "where",
            "astype",
            "assign",
            "reindex",
            "rename",
        ),
    )


def make_series_immutable(series: pd.Series):
    """
    Makes the given Series immutable.
    I.e. after calling this method - one cannot modify the series using pandas interface.


    Upon a trial to modify an immutable dataframe, an exception of type ImmutablePandas is raised.
    """
    if getattr(series, "_is_immutable", False):
        return
    series._is_immutable = True
    series._set_with_engine = functools.wraps(series._set_with_engine)(_raise_immutable_exception)
    series._set_with = functools.wraps(series._set_with)(_raise_immutable_exception)

    # prevent modification through numpy arrays
    series._data.external_values = _make_numpy_result_readonly(series._data.external_values)
    series._data.internal_values = _make_numpy_result_readonly(series._data.internal_values)

    _prevent_inplace_argument_in_function_calls(
        series,
        # This list was obtained by manual inspection +
        #  [attr for attr in dir(d) if hasattr(getattr(pd.Series, attr, None), '__code__') and
        #  'inplace' in getattr(pd.Series, attr).__code__.co_varnames]
        (
            "astype",
            "bfill",
            "clip",
            "drop",
            "drop_duplicates",
            "dropna",
            "ffill",
            "fillna",
            "interpolate",
            "mask",
            "replace",
            "reset_index",
            "set_axis",
            "sort_index",
            "sort_values",
            "where",
            "_set_name",
        ),
    )


class ImmutablePandas(Exception):
    pass


def _raise_immutable_exception(*args, **kwargs):
    raise ImmutablePandas(f"Cannot modify immutable dataframe. Please use df.copy()")


def _get_df_or_series_from_args(args):
    if len(args) >= 2 and (isinstance(args[1], pd.DataFrame) or isinstance(args[1], pd.Series)):
        return args[1]


def _protect_indexer(loc_func):
    def wrapper(*arg, **kwargs):
        res = loc_func(*args, **kwargs)
        return res


def _safe__init__(cls, self, *args, **kwargs):
    super(cls, self).__init__(*args, **kwargs)
    df_or_series = _get_df_or_series_from_args(args)
    if df_or_series is not None:
        if getattr(df_or_series, "_is_immutable", False):
            self._get_setitem_indexer = functools.wraps(self._get_setitem_indexer)(_raise_immutable_exception)


@functools.wraps(IndexingMixin.loc)
def _safe_loc(self):
    loc = _LocIndexer("loc", self)
    if getattr(self, "_is_immutable", False):
        # Edit also loc._setitem_with_indexer
        loc._get_setitem_indexer = functools.wraps(loc._get_setitem_indexer)(_raise_immutable_exception)
    return loc


@functools.wraps(IndexingMixin.iloc)
def _safe_iloc(self):
    iloc = _iLocIndexer("iloc", self)
    if getattr(self, "_is_immutable", False):
        # Edit also iloc._setitem_with_indexer
        iloc._get_setitem_indexer = functools.wraps(iloc._get_setitem_indexer)(_raise_immutable_exception)
    return iloc


# wraps
pd.DataFrame.loc = property(_safe_loc)
pd.Series.loc = property(_safe_loc)
pd.DataFrame.iloc = property(_safe_iloc)
pd.Series.iloc = property(_safe_iloc)


def _make_numpy_result_readonly(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, np.ndarray):
            res.flags.writeable = False
        return res

    return wrapper


def _make_result_immutable(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        res = func(*args, **kwargs)
        if isinstance(res, pd.Series):
            make_series_immutable(res)
        return res

    return wrapper


def _prevent_inplace_operation(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        # TODO: here we assume that in-place is not given as a positional.
        #  remove this assumption, either by hard-coding the position for each method or by parsing the
        #  function signature.
        if kwargs.get("inplace", False):
            _raise_immutable_exception()
        return func(*args, **kwargs)

    return wrapper


def _prevent_inplace_argument_in_function_calls(obj, attributes):
    for attr in attributes:
        member = getattr(obj, attr)
        setattr(obj, attr, _prevent_inplace_operation(member))


And the pytest file

import importlib
import warnings

import pandas as pd
import pytest

import immutable_pandas
from immutable_pandas import ImmutablePandas
from immutable_pandas import make_dataframe_immutable
from immutable_pandas import make_series_immutable


def create_immutable_dataframe() -> pd.DataFrame:
    # Cannot be used as a fixture because pytest copies objects transparently, which makes the tests flaky
    immutable_dataframe = pd.DataFrame({"x": [1, 2, 3, 4], "y": [4, 5, 6, 7]})
    make_dataframe_immutable(immutable_dataframe)
    return immutable_dataframe


def test_immutable_dataframe_cannot_change_with_direct_access():
    immutable_dataframe = create_immutable_dataframe()
    immutable_dataframe2 = immutable_dataframe.query("x == 2")
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        immutable_dataframe2["moshe"] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.x = 2
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["moshe"] = 56
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.insert(0, "z", [1, 2, 3, 4])


def test_immutable_dataframe_cannot_change_with_inplace_operations():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.eval("y=x+1", inplace=True)
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.assign(y=2, inplace=True)


def test_immutable_dataframe_cannot_change_with_loc():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.loc[2] = 1
    with pytest.raises(ImmutablePandas):
        immutable_dataframe.iloc[1] = 4


def test_immutable_dataframe_cannot_change_with_columns_access():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"][2] = 123
    with pytest.raises(ImmutablePandas):
        immutable_dataframe["x"].loc[2] = 123


def test_immutable_dataframe_cannot_del_column():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ImmutablePandas):
        del immutable_dataframe["x"]


def test_immutable_dataframe_cannot_be_modified_through_values():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(ValueError, match="read-only"):
        immutable_dataframe.values[0, 0] = 1
    # with pytest.raises(ValueError, match="read-only"):
    #     immutable_dataframe.as_matrix()[0, 0] = 1


def test_immutable_series_cannot_change_with_loc():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.loc[0] = 1
    with pytest.raises(ImmutablePandas):
        series.iloc[0] = 1


def test_immutable_series_cannot_change_with_inplace_operations():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ImmutablePandas):
        series.sort_index(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.sort_values(inplace=True)
    with pytest.raises(ImmutablePandas):
        series.astype(int, inplace=True)


def test_series_cannot_be_modeified_through_values():
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    series = pd.Series([1, 2, 3, 4])
    make_series_immutable(series)
    with pytest.raises(ValueError, match="read-only"):
        series.values[0] = 1234


def test_reloading_module_immutable_pandas_does_not_break_immutability():
    # We need to test the effects of reloading the module, because we modify the global variable
    #       pd.DataFrame.loc, pd.DataFrame.iloc,
    #       pd.Series.loc, pd.Series.iloc
    #       upon every reload of the module.
    df = create_immutable_dataframe()
    df2 = df.copy()
    immutable_pandas2 = importlib.reload(immutable_pandas)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df.astype(int, inplace=True)
    immutable_pandas2.make_dataframe_immutable(df2)
    with pytest.raises(immutable_pandas2.ImmutablePandas):
        df2.astype(int, inplace=True)


def test_at_and_iat_crash():
    immutable_dataframe = create_immutable_dataframe()
    with pytest.raises(immutable_pandas.ImmutablePandas):
        immutable_dataframe.iat[0, 0] = 1
    with pytest.raises(immutable_pandas.ImmutablePandas):
        immutable_dataframe.at[0, "x"] = 1


Jeromino
  • 171
  • 2
  • 3