I have two Series s1
and s2
with the same (non-consecutive) indices. How do I combine s1
and s2
to being two columns in a DataFrame and keep one of the indices as a third column?
9 Answers
I think concat
is a nice way to do this. If they are present it uses the name attributes of the Series as the columns (otherwise it simply numbers them):
In [1]: s1 = pd.Series([1, 2], index=['A', 'B'], name='s1')
In [2]: s2 = pd.Series([3, 4], index=['A', 'B'], name='s2')
In [3]: pd.concat([s1, s2], axis=1)
Out[3]:
s1 s2
A 1 3
B 2 4
In [4]: pd.concat([s1, s2], axis=1).reset_index()
Out[4]:
index s1 s2
0 A 1 3
1 B 2 4
Note: This extends to more than 2 Series.

- 359,921
- 101
- 625
- 535
-
5this actually avoids copying too (as compared to the dict solution) – Jeff Aug 05 '13 at 16:27
-
In one instance, it seems to be telling me 'ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()' - any ideas? – user7289 Aug 06 '13 at 12:11
-
@user7289 not sure where that would come from, could you ask this as another question? – Andy Hayden Aug 06 '13 at 12:48
-
@AndyHayden: what if there are duplicates along one or both indexes? – Mannaggia Sep 03 '14 at 16:23
-
@Mannaggia hmmmm, I think it depends what you want the answer to be... something with merge: `pd.merge(s1.reset_index(), s2.reset_index(), how='outer')` ? – Andy Hayden Sep 03 '14 at 21:29
-
@Jeff From the pandas website "It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a signifcant performance hit. If you need to use the operation over several datasets, use a list comprehension." – dafinguzman Oct 14 '15 at 21:10
-
3@dafinguzman what "constantly reusing this function" means is that you should prefer doing the concat **once** `pd.concat([list_of_dataframes])` vs concating many times `new_df = pd.DataFrame(); for df in list_of_dsf: new_df = pd.concat([new_df, df])` or similar. – Andy Hayden Oct 14 '15 at 22:07
-
what happens if the one of the series already exists in a dataframe, but I then try to concat that series that has values that are greater in length then the previous series – mp252 Dec 05 '18 at 15:26
-
It should do an outer join in that case, BUT you may want to be more specific in cases like this (to use a join/merge). See https://pandas.pydata.org/pandas-docs/stable/merging.html – Andy Hayden Dec 05 '18 at 18:52
-
You really ought to point out that this grammar is an extremely special case when the two series share the same index - otherwise you're hiding a lot of complexity under the hood. – gented Apr 16 '20 at 12:24
You can use to_frame
if both have the same indexes.
>= v0.23
a.to_frame().join(b)
< v0.23
a.to_frame().join(b.to_frame())

- 398,270
- 210
- 566
- 880

- 1,279
- 1
- 10
- 18
-
13Maybe this would be more appropriate: a.to_frame(name = 'a').join(b.to_frame(name='b')) – Ashok K Harnal Aug 29 '17 at 03:22
Pandas will automatically align these passed in series and create the joint index
They happen to be the same here. reset_index
moves the index to a column.
In [2]: s1 = Series(randn(5),index=[1,2,4,5,6])
In [4]: s2 = Series(randn(5),index=[1,2,4,5,6])
In [8]: DataFrame(dict(s1 = s1, s2 = s2)).reset_index()
Out[8]:
index s1 s2
0 1 -0.176143 0.128635
1 2 -1.286470 0.908497
2 4 -0.995881 0.528050
3 5 0.402241 0.458870
4 6 0.380457 0.072251

- 125,376
- 21
- 220
- 187
If I may answer this.
The fundamentals behind converting series to data frame is to understand that
1. At conceptual level, every column in data frame is a series.
2. And, every column name is a key name that maps to a series.
If you keep above two concepts in mind, you can think of many ways to convert series to data frame. One easy solution will be like this:
Create two series here
import pandas as pd
series_1 = pd.Series(list(range(10)))
series_2 = pd.Series(list(range(20,30)))
Create an empty data frame with just desired column names
df = pd.DataFrame(columns = ['Column_name#1', 'Column_name#1'])
Put series value inside data frame using mapping concept
df['Column_name#1'] = series_1
df['Column_name#2'] = series_2
Check results now
df.head(5)
Example code:
a = pd.Series([1,2,3,4], index=[7,2,8,9])
b = pd.Series([5,6,7,8], index=[7,2,8,9])
data = pd.DataFrame({'a': a,'b':b, 'idx_col':a.index})
Pandas allows you to create a DataFrame
from a dict
with Series
as the values and the column names as the keys. When it finds a Series
as a value, it uses the Series
index as part of the DataFrame
index. This data alignment is one of the main perks of Pandas. Consequently, unless you have other needs, the freshly created DataFrame
has duplicated value. In the above example, data['idx_col']
has the same data as data.index
.

- 651
- 3
- 7
Not sure I fully understand your question, but is this what you want to do?
pd.DataFrame(data=dict(s1=s1, s2=s2), index=s1.index)
(index=s1.index
is not even necessary here)

- 822,522
- 95
- 1,334
- 1,252

- 232
- 2
- 4
A simplification of the solution based on join()
:
df = a.to_frame().join(b)

- 47,830
- 31
- 106
- 135

- 131
- 1
- 4
If you are trying to join Series of equal length but their indexes don't match (which is a common scenario), then concatenating them will generate NAs wherever they don't match.
x = pd.Series({'a':1,'b':2,})
y = pd.Series({'d':4,'e':5})
pd.concat([x,y],axis=1)
#Output (I've added column names for clarity)
Index x y
a 1.0 NaN
b 2.0 NaN
d NaN 4.0
e NaN 5.0
Assuming that you don't care if the indexes match, the solution is to reindex both Series before concatenating them. If drop=False
, which is the default, then Pandas will save the old index in a column of the new dataframe (the indexes are dropped here for simplicity).
pd.concat([x.reset_index(drop=True),y.reset_index(drop=True)],axis=1)
#Output (column names added):
Index x y
0 1 4
1 2 5

- 1,018
- 12
- 19
I used pandas to convert my numpy array or iseries to an dataframe then added and additional the additional column by key as 'prediction'. If you need dataframe converted back to a list then use values.tolist()
output=pd.DataFrame(X_test)
output['prediction']=y_pred
list=output.values.tolist()

- 3,840
- 2
- 26
- 35