What is difference between DataFrame attribute and column

Question

In [66]: data
Out[66]: 
   col1 col2 label
0   1.0    a     c
1   2.0    b     d
2   3.0    c     e
3   0.0    d     f
4   4.0    e     0
5   5.0    f     0

In [67]: data.label
Out[67]: 
0      c
1      d
2    NaN
3      f
4    NaN
5    NaN
Name: col2, dtype: object

In [68]: data['label']
Out[68]: 
0    c
1    d
2    e
3    f
4    0
5    0
Name: label, dtype: object

Why data.label and data['label'] showing different results?

score 2 · Answer 1 · edited Jan 18 '22 at 23:27

The big difference I've noticed is assignment.

import random
import pandas as pd

s = "SummerCrime|WinterCrime".split("|")
j = {x: [random.choice(["ASB", "Violence", "Theft", "Public Order", "Drugs"]) for j in range(300)] for x in s}
df = pd.DataFrame(j)
df.FallCrime = [random.choice(["ASB", "Violence", "Theft", "Public Order", "Drugs"]) for j in range(300)]

Gives: UserWarning: Pandas doesn't allow columns to be created via a new attribute name

However, there are also docs associated with this, which has the following warnings which may be related to your problem:

You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.

The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed, but s['min'] is possible.

Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items.

In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column.

They go on to say:

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; if you try to use attribute access to create a new column, it creates a new attribute rather than a new column. In 0.21.0 and later, this will raise a UserWarning

So it's possible you did this without realizing.

score 1 · Answer 2 · answered Nov 16 '18 at 03:34

The difference between these two is related to assignment. with data.label you cannot assign the values to column.

data.label is to access the attributes and data["label"] is to assign the values.

Also if you have spaces in your column name, for example df['label name'], while using data.label name will through an error.

For more information see this Answer link

What is difference between DataFrame attribute and column

2 Answers2