0

I have just recently started on python data science and noticed that i can call the columns of a dataset in two ways. I was wondering if there was an advantage to using one method over the other or can they be used interchangeably?

import seaborn
iris = seaborn.load_dataset('iris')

print(iris.species)
print(iris['species'])

Both print statements give the same output in Jupyter

Sociopath
  • 13,068
  • 19
  • 47
  • 75
Khye
  • 15
  • 3

2 Answers2

1

There is no difference. iris is a Pandas Dataframe, and these are two different ways to access a column in a Dataframe.

Try this:

iris['species'] is iris.species
# True

You can use either method, but I find the indexing approach (iris['species']) is more versatile, e.g. you can use it to access columns whose names contain spaces, you can use it to create new columns, and you won't ever accidentally retrieve a dataframe method or attribute (e.g. iris.shape) instead of a column.

Also see answers to these questions:

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
  • Thank you! This really helped! Also thanks for the links as well, I was googling for "iris.series vs iris['series']" but nothing really came up – Khye Jul 17 '19 at 08:02
  • You're welcome! There's also a little mention of it in the pandas documentation [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html), but nothing very systematic. You may be able to find a more complete discussion elsewhere in the docs. – Matthias Fripp Jul 17 '19 at 09:05
1

Both methods of accessing the dictionary are equivalent.

The main advantage of accessing the iris dictionary via its 'species' key (e.g. iris['species']) is that the specified dictionary key can have spaces.

For example, you can access the iris dictionary with a 'plant color' key like so: iris['plant color']. However, you cannot access the iris dictionary via iris.plant color.

natn2323
  • 1,983
  • 1
  • 13
  • 30