I have several series variables I would like to concatenate (along axis=1) to create a DataFrame. I would like the series' names to appear as column names in the DataFrame. I have come across several ways to do this.
The most intuitive approach seems to me to be the following :
import pandas as pd
x1 = pd.Series([1,2,3],name='x1')
x2 = pd.Series([11,12,13],name='x2')
df = pd.DataFrame([x1,x2])
print(df)
But rather than make the Series names the column headers, the series data are used as rows in the DataFrame.
0 1 2
x1 1 2 3
x2 11 12 13
This strikes me as counter-intuitive for two reasons.
The data in a Series is likely to be all of one type of data, i.e. stock prices, time series data, etc. So it seems intuitive that the Series data should be a column, rather than a row, in the DataFrame.
When extracting a column as a Series from an existing DataFrame, the column name is used as the name of the Series.
Example :
df = pd.DataFrame({'x1' : [1,2,3], 'x2' : [4,5,6]})
print(type(df['x1']))
print(df['x1'].name)
<class 'pandas.core.series.Series'>
x1
So why isn't the name used as column header when constructing a DataFrame from a Series?```
I can always construct a DataFrame from a dictionary to get the result I want :
df = pd.DataFrame({'x1' : x1, 'x2' : x2})
print(df)
x1 x2
0 1 11
1 2 12
2 3 13
But this strikes me as awkward, since I would have to duplicate the series' names (or at least refer to them in the construction of the dictionary).
On the other hand, the Pandas concat
method does what I would expect for default behavior :
df = pd.concat([x1,x2],axis=1)
print(df)
x1 x2
0 1 11
1 2 12
2 3 13
So, my question is, why isn't the behavior I get with concat
the default behavior when constructing a DataFrame from a list of series variables?