0

Why is the result of the people variable the same as the result of the dfPeople dataframe, after adding the City column? What concept am I not understanding?

File people.csv:

id,First Name,Last Name,Age
1,José,Pereira,40
2,João,Silva,33
3,Pedro,Campos,28

Reported problem code:

import pandas as pd

people = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/people.csv')
dfPeople = pd.DataFrame(people)

city = ['Mumbai','Beijing','New York']

dfPeople["City"] = city

I visualized the data in the following ways:

print(dfPeople)
dfPeople.head()
dfPeople

print(people)
people.head()
people

Proposed solutions to the problem based on the comments.

MattDMo's Solution:

import pandas as pd

people = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/people.csv')

city = ['Mumbai','Beijing','New York']

people["City"] = city

people.head()

Emma's Solution:

import pandas as pd

people = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/people.csv')

dfPeople = pd.DataFrame(people.copy())

city = ['Mumbai','Beijing','New York']

dfPeople["City"] = city

#dfPeople.head()
people.head()

campos
  • 153
  • 2
  • 12
  • 2
    Because `pd.read_csv()` returns a `DataFrame` already, you don't need to call `pd.DataFrame()` on it. – MattDMo Mar 22 '22 at 15:47
  • "What concept am I not understanding?" https://stackoverflow.com/questions/38895768/python-pandas-dataframe-is-it-pass-by-value-or-pass-by-reference – Emma Mar 22 '22 at 15:52
  • But why does the City column exist in the people variable? I see this column when I type `people.head()`. – campos Mar 22 '22 at 16:09
  • 1
    Check this specific answer. https://stackoverflow.com/a/38924624/2956135 You are _mutating_ the `dfPeople`'s attribute without _rebinding_. That's why the original `people` is mutated. If you do `dfPeople = city` after `dfPeople = pd.DataFrame(people)`, this is rebinding and `people` won't be mutated. Or if you do `dfPeople = pd.DataFrame(people.copy())`, you are passing the pointer to the _copied_ object, then even if you do `dfPeople['city'] = city`, `people` won't be mutated. – Emma Mar 22 '22 at 16:34
  • I put the proposed solutions to the problem in the body of the opening question. – campos Mar 22 '22 at 23:07
  • I gave more of a context about why the mutation is happening, however, as @MattDMo mentioned, `read_csv` is already returning the dataframe, so you do not need `pd.DataFrame(people.copy())`. – Emma Mar 24 '22 at 06:31

1 Answers1

0

As the best solution, the answer sent by @MattDMo is adopted.

campos
  • 153
  • 2
  • 12