I ran the following Python code, which creates a Pandas DataFrame with two Series (a
and b
), and then attempts to create two new Series (c
and d
):
import pandas as pd
df = pd.DataFrame({'a':[1, 2, 3], 'b':[4, 5, 6]})
df['c'] = df.a + df.b
df.d = df.a + df.b
My understanding is that if a Pandas Series is part of a DataFrame, and the Series name does not have any spaces (and does not collide with an existing attribute or method), the Series can be accessed as an attribute of the DataFrame. As such, I expected that line 3 would work (since that's how you create a new Pandas Series), and I expected that line 4 would fail (since the d
attribute does not exist for the DataFrame until after you execute that line of code).
To my surprise, line 4 did not result in an error. Instead, the DataFrame now contains three Series:
>>> df
a b c
0 1 4 5
1 2 5 7
2 3 6 9
And there is a new object, df.d
, which is a Pandas Series:
>>> df.d
0 5
1 7
2 9
dtype: int64
>>> type(df.d)
pandas.core.series.Series
My questions are as follows:
- Why did line 4 not result in an error?
- Is
df.d
now a "normal" Pandas Series with all of the regular Series functionality? - Is
df.d
in any way "connected" to thedf
DataFrame, or is it a completely independent object?
My motivation in asking this question is simply that I want to better understand Pandas, and not because there is a particular use case for line 4.
My Python version is 2.7.11, and my Pandas version is 0.17.1.