2

Is there a way to align multiple pandas Series or DataFrames.

Say I have a list of pandas Series, and want to:

  • only keep index elements that are present in all series (inner join)
  • all series have index elements that are the union of all indices (outer join)

The following code achieves what I want (for inner and outer join)

import pandas as pd
import itertools

def align(pd_objects, join='outer', axis=0):
    """apply align on all combinations of the list of pd objects"""
    for (i, j) in itertools.combinations(range(len(pd_objects)), 2):
        (pd_objects[i], pd_objects[j]) = pd_objects[i].align(pd_objects[j], join, axis)
    return tuple(pd_objects)

s_1 = pd.Series(['a', 'b', 'c', 'd'], index=[1, 2, 3, 4])
s_2 = pd.Series(['b', 'c', 'd'], index=[2, 3, 4])
s_3 = pd.Series(['a', 'b', 'c'], index=[1, 2, 3])

Where:

(s_1x, s_2x, s_3x) = align([s_1, s_2, s_3], join='inner', axis=0)

Returns three series with indices ['b', 'c']

And:

(s_1y, s_2y, s_3y) = align([s_1, s_2, s_3], join='outer', axis=0)

Returns three series with indices ['a', 'b', 'c', 'd']

But I guess there is a much more Pythonic and efficient way to do it

Parfait
  • 104,375
  • 17
  • 94
  • 125
PvK
  • 340
  • 2
  • 15
  • For multiple dataframes, you can probably check this out - https://stackoverflow.com/questions/23668427/pandas-three-way-joining-multiple-dataframes-on-columns – panktijk Sep 25 '18 at 21:09
  • thanks @panktijk, but I think reduce outputs only a single object, whereas I would like to output multiple series/dataframes; from the documentation: "For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5)." – PvK Sep 26 '18 at 06:51

0 Answers0