10

I understand that I can use combine_first to merge two series:

series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
series2 = pd.Series([1,2,3,4,5],index=['f','g','h','i','j'])
series3 = pd.Series([1,2,3,4,5],index=['k','l','m','n','o'])

Combine1 = series1.combine_first(series2)
print(Combine1

Output:

a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
f    1.0
g    2.0
h    3.0
i    4.0
j    5.0
dtype: float64

What if I need to merge 3 or more series?

I understand that using the following code: print(series1 + series2 + series3)yields:

a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
f   NaN
...
dtype: float64

Can I merge multiple series efficiently without using combine_first multiple times?

Thanks

cs95
  • 379,657
  • 97
  • 704
  • 746
Kane Chew
  • 3,693
  • 4
  • 12
  • 24

4 Answers4

17

Combine Series with Non-Overlapping Indexes

To combine series vertically, use pd.concat.

# Setup
series_list = [
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('fghij')),
    pd.Series(range(1, 6), index=list('klmno'))
]

pd.concat(series_list)

a    1
b    2
c    3
d    4
e    5
f    1
g    2
h    3
i    4
j    5
k    1
l    2
m    3
n    4
o    5
dtype: int64

Combine with Overlapping Indexes

series_list = [
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('abcde')),
    pd.Series(range(1, 6), index=list('kbmdf'))
]

If the Series have overlapping indices, you can either combine (add) the keys,

pd.concat(series_list, axis=1, sort=False).sum(axis=1)

a     2.0
b     6.0
c     6.0
d    12.0
e    10.0
k     1.0
m     3.0
f     5.0
dtype: float64

Alternatively, just drop duplicates values on the index if you want to take only the first/last value (when there are duplicates).

res = pd.concat(series_list, axis=0)
# keep first value
res[~res.index.duplicated(keep='first')]
# keep last value
res[~res.index.duplicated(keep='last')]
cs95
  • 379,657
  • 97
  • 704
  • 746
5

Presuming that you were using the behavior of combine_first to prioritize the values of the series in order as combine_first is meant for, you could succinctly make multiple calls to it with a lambda expression.

from functools import reduce
l_series = [series1, series2, series3]
reduce(lambda s1, s2: s1.combine_first(s2), l_series)

Of course if the indices are unique as in your current example, you can simply use pd.concat instead.

Demo

series1 = pd.Series(list(range(5)),index=['a','b','c','d','e'])
series2 = pd.Series(list(range(5, 10)),index=['a','g','h','i','j'])
series3 = pd.Series(list(range(10, 15)),index=['k','b','m','c','o'])

from functools import reduce
l_series = [series1, series2, series3]
print(reduce(lambda s1, s2: s1.combine_first(s2), l_series))

# a     0.0
# b     1.0
# c     2.0
# d     3.0
# e     4.0
# g     6.0
# h     7.0
# i     8.0
# j     9.0
# k    10.0
# m    12.0
# o    14.0
# dtype: float64
miradulo
  • 28,857
  • 6
  • 80
  • 93
  • While in principle, this is the same as my answer, the touch with reduce is quite nice! – cs95 Sep 18 '17 at 03:47
  • No worries my friend. You get a +1 from me. – cs95 Sep 18 '17 at 03:55
  • @Mitch I guess I have a lot to learn because I dun quite understand the objective/purpose of using 'reduce' in this instance – Kane Chew Sep 18 '17 at 04:00
  • @KaneChew We want to apply `combine_first ` first to `series1` and `series2`, and then to that result and `series3`. `reduce` with a lambda works nicely for this - first try [checking out the docs](https://docs.python.org/3/library/functools.html#functools.reduce) with the addition example and hopefully it will make a bit more sense. – miradulo Sep 18 '17 at 04:03
1

Agree with what @codespeed has pointed out in his answer.

I think it will depend on user needs. If series index are confirmed with no overlapping, concat will be a better option. (as original question posted, there is no index overlapping, then concat will be a better option)

If there is index overlapping, you might need to consider how to handle overlapping, which value to be overwritten. (as example provided by codespeed, if index are matching to different values, need to be careful about combine_first)

i.e. (note series3 is same as series1, series2 is same as series4)

import pandas as pd
import numpy as np


series1 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
series2 = pd.Series([2,3,4,4,5],index=['a','b','c','i','j'])
series3 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
series4 = pd.Series([2,3,4,4,5],index=['a','b','c','i','j'])


print(series1.combine_first(series2))



a    1.0
b    2.0
c    3.0
d    4.0
e    5.0
i    4.0
j    5.0
dtype: float64



print(series4.combine_first(series3))



a    2.0
b    3.0
c    4.0
d    4.0
e    5.0
i    4.0
j    5.0
dtype: float64
White
  • 627
  • 4
  • 10
0

You would use combine_first if you want one series's values prioritized over the other. Its usually used to fill the missing values in the first series. I am not sure whats the expected output in your example but looks like you can use concat

pd.concat([series1, series2, series3])

You get

a    1
b    2
c    3
d    4
e    5
f    1
g    2
h    3
i    4
j    5
k    1
l    2
m    3
n    4
o    5
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • @vaishali Could you explain/elaborate what you meant by prioritized? Thanks. – Kane Chew Sep 18 '17 at 03:35
  • @Kane Chew, say you have two series, series1 = pd.Series([1,2,np.nan,np.nan,5],index=['a','b','c','d','e']) series2 = pd.Series([1,2,3,4,5],index=['f','g','h','i','j']), you use series1.combine_first(series2) to fill the missing values on the first with the second – Vaishali Sep 18 '17 at 03:41
  • @vaishali I guess you meant when both series have the same index? 'series1 = pd.Series([1,2,np.nan,np.nan,5],index=['a','b','c','d','e']) series2 = pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) series3 = series1.combine_first(series2) print(series3) ' I am surprised that the values do not add up for the same index – Kane Chew Sep 18 '17 at 03:52