3

I'm playing with copy.deepcopy from copy package. My intention was to be able to "really" create a copy of a variable in python, as it can be done in R or VBA for instance.

Now take a look at my code:

This is the first part where I load the libraries and download a copy of the iris dataset from the internet.

import copy
import statsmodels.api as sm
dataset_iris = sm.datasets.get_rdataset(dataname='iris',
                                    package='datasets')

Here I'm trying to create a copy by value of the object dataset_iris

iris = copy.deepcopy(dataset_iris)
iris = iris['data']

At this moment if I check the value of first column's name in the original dataset I get two names in uppercase separated by a dot:

print(dataset_iris['data'].columns.values[0])
#Sepal.Length

Than I change the copied dataset's (iris) column first name

iris.columns.values[0] = 'sepal_length'

When I check the original dataset first column's name, it has also changed. This behavior is supposed to be expected if I had done a "copy" with the "=" sign in Python, but not through the usage of package copy and its function deepcopy.

print(dataset_iris['data'].columns.values[0])
#sepal_length

I'm certainly missing something here in basic Python understanding, I just don't know exactly what it is.

Best regards,

Gustavo

Gustavo Mirapalheta
  • 931
  • 2
  • 11
  • 25
  • It seems that my example is going against what has been said in this post: https://stackoverflow.com/questions/2465921/how-to-copy-a-dictionary-and-only-edit-the-copy?rq=1 . Although the post talks about dictionaries and my post about pandas dataframes, what has been said there is exactly what I was trying to do, with the exception that the result that I got gives back the opposite result (in other words copy.deepcopy is NOT doing its trick...) – Gustavo Mirapalheta Mar 08 '19 at 00:45
  • I don't know anything about your statsmodels module but one explanation would be that the dataset_iris object is connected to a database beyond your python process, so that when you refer to its various attributes the underlying object is actually going out to a database to get/set the values. Such an external database would of course not get copied by copy.deepcopy. – electrogas Mar 08 '19 at 03:34
  • Nope, no database on my computer. iris dataset is a python dictionary. key "data" provides a pandas dataframe with the set itself. But so far I have solved the problem redownloading the dataset whenever I need it refreshed. Thanks for your help anyway. Regards, Gustavo. – Gustavo Mirapalheta Mar 08 '19 at 03:53
  • Well, I have some news on this issue. copy.deepcopy is not working on the whole dataset, on the other hand if I issue copy.deepcopy against only iris.columns.values they are copied by value. It seems to me that deepcopy is not that "deep" in the sense that it is not able to fully copy a more complex object (like iris which has several layers of data) but it is able to fully copy a list of names (which is way much shallower than a complex dataset like iris). Best regards to everybody! – Gustavo Mirapalheta Mar 08 '19 at 14:39
  • could it be that part of your data object is based on a python extension, and providing data from some c++ dll or such? – electrogas Mar 09 '19 at 03:02
  • Hi electrogas. Sorry, but I don't know if what you mention is or is not the case. But I have some more news on this issue. Changing iris.columns.values does change the names of the columns but when you try to use the dataframe to make a seaborn graphics it gives an error (the seaborn function scatterplot to be more exact). On the other hand I noticed that iris.columns can be replaced as a whole with a new list (with names that you want to replaced and names that you want to be the same) but you CAN'T change them one by one. Maybe this is parameters by assignment in action. – Gustavo Mirapalheta Mar 09 '19 at 15:58

0 Answers0