1

I am trying use pandas DataFrame.combine to combine multiple data frames. However, I couldn't figure out how to fulfill the func parameter. The doc is not very clear to me. The documentation specifies:

DataFrame.combine(other, func, fill_value=None, overwrite=True)
other : DataFrame
func : function. Function that takes two series as inputs and return a Series or a scalar
fill_value : scalar value
overwrite : boolean, default True. If True then overwrite values for common keys in the calling frame

After some research, I found out that a similar command, DataFrame.combine_first can be used with reduce as below to combine multiple data frames (link):

reduce(lambda left,right: pd.DataFrame.combine_first(left,right), [pd.read_csv(f) for f in files])

How can I use DataFrame.combine to combine multiple data frames?

lovechillcool
  • 762
  • 2
  • 10
  • 32
  • `func` param takes two `Series` as input(The two `Series` that you want to merge), and return combines series. You can also apply a combine critera or filter rows/columns using `func` param – Sohaib Farooqi Dec 25 '17 at 06:02
  • You can only add two dataframe at a time with Dataframe.combine method. Can you describe what problem you are trying to solve by providing sample dataframes and expected output – Sahil Dahiya Dec 25 '17 at 06:02
  • @SahilDahiya That's not entirely true. A similar command `DataFrame.combine_first` can combine more than 2 data frames, as the link I posted shown. – lovechillcool Dec 25 '17 at 06:06
  • @GarbageCollector How can I skip the param `func`? I don't need further criteria just simply use `combine`. – lovechillcool Dec 25 '17 at 06:09
  • You cannot skip `func` param. If you just want to combine multiple dataframes without any condition, have a look at [concat](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html) – Sohaib Farooqi Dec 25 '17 at 06:15
  • @GarbageCollector I have tried all `concat`, `append`, `merge`, `join`. Only `combine_first` in the link I pasted achieved my goal. However, I would like to use the original `combine` but no luck on `func`. Is it possible to pass on a dull function (like 1==1) just to fulfill the param? – lovechillcool Dec 25 '17 at 06:21

1 Answers1

0

As per documentation, you can use Dataframe.combine to Add two DataFrame objects and do not propagate NaN values. If for a (column, time) one frame is missing a value, it will default to the other frame’s value (which might be NaN as well).

func is a function where you write your logic to choose the value. I think you have confusion because of lambda expression. Let me rewrite the example given in documentation without using lambda expression.

def _fn(left, right):
    if left.sum() < right.sum():
        return left
    else
        return right

df1 = DataFrame({'A': [0, 0], 'B': [4, 4]})
df2 = DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine(df2, _fn)

Output :

    A  B

0  0  3

1  0  3

P.S : Since OP wants to use Dataframe.combine to replicate behavior of Dataframe.combine_first, I am pasting the source code of Dataframe.combine_first from pandas github repository. https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L4153

def combine_first(self, other):
    import pandas.core.computation.expressions as expressions

    def combiner(x, y, needs_i8_conversion=False):
        x_values = x.values if hasattr(x, 'values') else x
        y_values = y.values if hasattr(y, 'values') else y
        if needs_i8_conversion:
            mask = isna(x)
            x_values = x_values.view('i8')
            y_values = y_values.view('i8')
        else:
            mask = isna(x_values)

        return expressions.where(mask, y_values, x_values)

    return self.combine(other, combiner, overwrite=False)
MSS
  • 3,306
  • 1
  • 19
  • 50
  • Tony, if you check out the link I pasted above, a similar command `DataFrame.combine_first` can combine more than 2 data frames. https://stackoverflow.com/a/44338256/4338329. I really want to achieve a similar thing with `combine`. – lovechillcool Dec 25 '17 at 06:21
  • That example is for merging dataframes. Merging and combine are slightly different, I guess. – MSS Dec 25 '17 at 06:25
  • When I replace `merge` with `combine_first`, it worked perfectly. So I just need to know how to make it work for `combine`. – lovechillcool Dec 25 '17 at 06:49
  • Why don't you stick with combine_first, if that is working for you? Why do you want to reinvent the wheel? – MSS Dec 25 '17 at 07:37
  • I have edited my answer to include the source code of `combine_first`. Have a look. – MSS Dec 25 '17 at 08:53
  • I don't see what pasting the source code out of context can do here. – cs95 Dec 25 '17 at 09:08
  • It isn't out of context. OP wants to use `combine`method. That requires a `func` argument. The implementation of `combine_first` shows how to achieve that `func`. – MSS Dec 25 '17 at 09:11
  • @Tony I would like to use the genuine method `combine`, instead of the derived one. Also, `combine_first` replace the first data frame's null values with values from the second one, which is not required. I was trying to decipher the source code. But I was not able to figure out how to compose the `func` param. – lovechillcool Dec 25 '17 at 23:12