1

I want to patch a method by running the original method with additional code before and after. In particular, I'm running tests within a pyfakefs in-memory file system, but I want to sometime use the real file system because some packages will not work on the fake file system (pybedtools in my case).

There is probably simple way to do this, but I can't figure it out after many, many tries. Is this possible?

Just for an example, below I'm trying to patch to_csv in pandas.

import os
import tempfile
from unittest.mock import patch
import pandas as pd
from pyfakefs.fake_filesystem_unittest import Patcher


df_intervals = pd.DataFrame([
     ['1', 10, 20],
     ['20', 45, 55]],
     columns=['chrom', 'start', 'end'])


with Patcher(use_known_patches=True) as patcher:
    # As expecte writing to fake filesystem works
    fname = tempfile.NamedTemporaryFile()
    df_intervals.to_csv(fname.name)
    assert not os.path.exists(fname.name)
    assert patcher.fs.isfile(fname.name)

    # But, how do I patch `to_csv` to write to the real filesystem? My failed attempts:
    # Attempt 1
    # TypeError: super(type, obj): obj must be an instance or subtype of type
    class patched_DataFrame(pd.DataFrame):
        def to_csv(self, fname):
            print('Pausing fake file system')
            patcher.pause()
            super().to_csv(fname)
            print('Resuming fake file system')
            patcher.resume()

    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_DataFrame.to_csv):
        df_intervals.to_csv(fname.name)

    # Attempt 2: TypeError: 'patched_DataFrame' object is not callable
    with patch('pandas.core.frame.DataFrame', new_callable=patched_DataFrame):
        df_intervals.to_csv(fname.name)

    # Attempt 3: infinite recursion
    def patched_to_csv(self, fname):
        print('Pausing fake file system')
        patcher.pause()
        self.to_csv(fname)
        print('Resuming fake file system')
        patcher.resume()

    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_to_csv):
        df_intervals.to_csv(fname.name)
kdauria
  • 6,300
  • 4
  • 34
  • 53

2 Answers2

1

One (not very elegant) possibility would be to use the third approach and avoid the recursion by using the old saved to_csv method:

from pyfakefs.fake_filesystem_unittest import Patcher, Pause


with Patcher() as patcher:
    ...

    def patched_to_csv(self, fname):
        with Pause(patcher.fs):
            original_to_csv(self, fname)

    original_to_csv = pd.core.generic.NDFrame.to_csv
    with patch.object(pd.core.generic.NDFrame, 'to_csv', new=patched_to_csv):
        df_intervals.to_csv(fname.name)

Note that I used the context manager for pause/resume - this would allow to easily propagate a return value of the patches function if needed and is less error-prone.
Also note that use_known_patches is True by default.

Disclaimer:
I'm a contributor to pyfakefs.

Update: I changed the answer, because the previous attempt to avoid the recursion was wrong.

MrBean Bremen
  • 14,916
  • 3
  • 26
  • 46
1

Here is one way to do it.

df_intervals = pd.DataFrame([
     ['1', 10, 20],
     ['20', 45, 55]],
     columns=['chrom', 'start', 'end'])


def fakefs_decorator(func, patcher):
    """ Force a method to work on the real filesystem """
    def fs_wrapper(*args, **kwargs):
        patcher.pause()
        out = func(*args, **kwargs)
        patcher.resume()
        return out

    if hasattr(func, '__self__'):
        def c_wrapper(_, *args, **kwargs):
            return fs_wrapper(*args, **kwargs)
        return classmethod(c_wrapper)
    return fs_wrapper


with Patcher(allow_root_user=False, use_known_patches=True) as patcher:
    fs_from_dataframe = fakefs_decorator(pybedtools.BedTool.from_dataframe, patcher)
    fs_to_dataframe = fakefs_decorator(pybedtools.BedTool.to_dataframe, patcher)
    fs_intersect = fakefs_decorator(pybedtools.BedTool.intersect, patcher)

    @patch('pybedtools.BedTool.from_dataframe', new=fs_from_dataframe)
    @patch('pybedtools.bedtool.BedTool.to_dataframe', new=fs_to_dataframe)
    @patch('pybedtools.bedtool.BedTool.intersect', new=fs_intersect)
    def test(df_intervals):
        bed_object = pybedtools.BedTool.from_dataframe(df_intervals)
        joined_bed_object = bed_object.intersect(bed_object)
        df = joined_bed_object.to_dataframe()
        return df

    df = test(df_intervals)
kdauria
  • 6,300
  • 4
  • 34
  • 53
  • Hadn't seen your answer while I corrected my own - I like the decorator approach, and also it is more generic. – MrBean Bremen Jun 20 '21 at 08:21
  • 1
    Your use of the Pause context manager is cleaner. When I combine it with the decorator, I'm pretty happy with the result. Thanks for the help! – kdauria Jun 20 '21 at 18:56