19

I'm working my way through Pandas for Data Analysis and learning a ton. However, one thing keeps coming up. The book typically refers to columns of a dataframe as df['column'] however, sometimes without explanation the book uses df.column.

I don't understand the difference between the two. Any help would be appreciated.

Below is come code demonstrating the what I am talking about:

In [5]:

import pandas as pd

data = {'column1': ['a', 'a', 'a', 'b', 'c'], 
        'column2': [1, 4, 2, 5, 3]}
df = pd.DataFrame(data, columns = ['column1', 'column2'])
df

Out[5]:
column1 column2
0    a   1
1    a   4
2    a   2
3    b   5
4    c   3
5 rows × 2 columns

df.column:

In [8]:

df.column1
Out[8]:
0    a
1    a
2    a
3    b
4    c
Name: column1, dtype: object

df['column']:

In [9]:

df['column1']
Out[9]:
0    a
1    a
2    a
3    b
4    c
Name: column1, dtype: object
Anton
  • 4,765
  • 12
  • 36
  • 50
  • I closed this as a duplicate, but LMK if there are any intricacies I missed. I'm not a Pandas expert, but they seem to be the same. – wjandrea Jan 18 '22 at 23:32

1 Answers1

11

for setting, values, you need to use df['column'] = series.

once this is done however, you can refer to that column in the future with df.column, assuming it's a valid python name. (so df.column works, but df.6column would still have to be accessed with df['6column'])

i think the subtle difference here is that when you set something with df['column'] = ser, pandas goes ahead and adds it to the columns/does some other stuff (i believe by overriding the functionality in __setitem__. if you do df.column = ser, it's just like adding a new field to any existing object which uses __setattr__, and pandas does not seem to override this behavior.

acushner
  • 9,595
  • 1
  • 34
  • 34
  • 8
    furthermore, you can have spaces in the column name, e.g. ``df['column foo bar']``, while using ``df.column foo bar`` is an error – Jeff May 08 '14 at 15:54