Is there a way to align multiple pandas Series or DataFrames.
Say I have a list of pandas Series, and want to:
- only keep index elements that are present in all series (inner join)
- all series have index elements that are the union of all indices (outer join)
The following code achieves what I want (for inner and outer join)
import pandas as pd
import itertools
def align(pd_objects, join='outer', axis=0):
"""apply align on all combinations of the list of pd objects"""
for (i, j) in itertools.combinations(range(len(pd_objects)), 2):
(pd_objects[i], pd_objects[j]) = pd_objects[i].align(pd_objects[j], join, axis)
return tuple(pd_objects)
s_1 = pd.Series(['a', 'b', 'c', 'd'], index=[1, 2, 3, 4])
s_2 = pd.Series(['b', 'c', 'd'], index=[2, 3, 4])
s_3 = pd.Series(['a', 'b', 'c'], index=[1, 2, 3])
Where:
(s_1x, s_2x, s_3x) = align([s_1, s_2, s_3], join='inner', axis=0)
Returns three series with indices ['b', 'c']
And:
(s_1y, s_2y, s_3y) = align([s_1, s_2, s_3], join='outer', axis=0)
Returns three series with indices ['a', 'b', 'c', 'd']
But I guess there is a much more Pythonic and efficient way to do it