2
In [66]: data
Out[66]: 
   col1 col2 label
0   1.0    a     c
1   2.0    b     d
2   3.0    c     e
3   0.0    d     f
4   4.0    e     0
5   5.0    f     0

In [67]: data.label
Out[67]: 
0      c
1      d
2    NaN
3      f
4    NaN
5    NaN
Name: col2, dtype: object

In [68]: data['label']
Out[68]: 
0    c
1    d
2    e
3    f
4    0
5    0
Name: label, dtype: object

Why data.label and data['label'] showing different results?

2 Answers2

2

The big difference I've noticed is assignment.

import random
import pandas as pd

s = "SummerCrime|WinterCrime".split("|")
j = {x: [random.choice(["ASB", "Violence", "Theft", "Public Order", "Drugs"]) for j in range(300)] for x in s}
df = pd.DataFrame(j)
df.FallCrime = [random.choice(["ASB", "Violence", "Theft", "Public Order", "Drugs"]) for j in range(300)]

Gives: UserWarning: Pandas doesn't allow columns to be created via a new attribute name

However, there are also docs associated with this, which has the following warnings which may be related to your problem:

  • You can use this access only if the index element is a valid Python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.
  • The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed, but s['min'] is possible.
  • Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items.
  • In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column.

They go on to say:

You can use attribute access to modify an existing element of a Series or column of a DataFrame, but be careful; if you try to use attribute access to create a new column, it creates a new attribute rather than a new column. In 0.21.0 and later, this will raise a UserWarning

So it's possible you did this without realizing.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Charles Landau
  • 4,187
  • 1
  • 8
  • 24
1

The difference between these two is related to assignment. with data.label you cannot assign the values to column.

data.label is to access the attributes and data["label"] is to assign the values.

Also if you have spaces in your column name, for example df['label name'], while using data.label name will through an error.

For more information see this Answer link

Chandella07
  • 2,089
  • 14
  • 22