2

I was making configuration program to help the users configurate a .json file. One of the feature of the program was to check if the saved json is the same as the a new json file made by the user. If the two .json are not the same, it will tell the user to save the .json file that is being configurated in the program.

My first thought was to read from the .json file every time when checking if the two .json files are the same. It looked something like this:

# read from the saved json file
new_settings = {"key1": 1, "key2": 2, "array1": []} # json.load(open('config.json', 'r').read())
# modifying new_settings
new_settings['array1'].append('Data')

def checkIsDifferent():
    # read from the saved json file
    saved_settings = {"key1": 1, "key2": 2, "array1": []} # json.load(open('config.json', 'r').read())
    if saved_settings == new_settings:
        print('Configuration is saved')
    else:
        print('(*)Configuration is not saved')

I don't think constantly reading from a file will be good way to compare the "settings" in my case, so I came up with another way, by copying the saved .json to a variable, and then use the variable to compare:

saved_settings = {"key1": 1, "key2": 2, "array1": []} # read from the saved json file
new_settings = saved_settings.copy()

# modify
new_settings['array1'].append('Data')

def checkIsDifferent():
    if saved_settings == new_settings:
        print('Configuration is saved')
    else:
        print('(*)Configuration is not saved')

The first solution went expected. It outputted "(*)Configuration is not saved" when running checkIsDifferent() function. But when I run checkIsDifferent() on the second solution it outputted "Configuration is saved".

Is dict.copy() in python broken? How can I fix it for the second solution?

System Environment:

Python version: Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:43:08) [MSC v.1926 32 bit (Intel)]

OS: Windows 10

ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
TKperson
  • 140
  • 2
  • 8
  • 2
    `dict.copy()` performs a "shallow" copy: `saved_settings['array1']` and `new_settings['array1']` are both references to the same list and the append is affecting both. Use [copy.deepcopy](https://docs.python.org/3/library/copy.html#copy.deepcopy) instead – Iain Shelvington Apr 17 '21 at 06:50
  • Pretty sure you need to make a deep copy for them to be different. The pointer to the list will be copied, but the list will be the same. – LPR Apr 17 '21 at 06:50
  • When you call copy it will take "key1" and allocate new memory for 1 because 1 is immutable (its an integer). But when it takes "array1" it won't allocate new memory for [] because lists are mutable. – LPR Apr 17 '21 at 06:52
  • @LPR Python also has to allocate memory for empty lists. – mkrieger1 Apr 17 '21 at 07:00
  • @mkrieger1 yes, you are correct, but when you perform copy in this manner, it will be the same list in both dictionaries. It will not create a new list. – LPR Apr 17 '21 at 07:04

1 Answers1

1

copy() will only copy references to anything that is not of a primitive type. Use deepcopy instead.

from copy import deepcopy

saved_settings = {"key1": 1, "key2": 2, "array1": []} # read from the saved json file
new_settings = deepcopy(saved_settings)
# modify
new_settings['array1'].append('Data')

def checkIsDifferent():
    if saved_settings == new_settings:
        print('Configuration is saved')
    else:
        print('(*)Configuration is not saved')
rabl
  • 119
  • 4
  • What do you mean by the first sentence? A list is a collection of references to other objects and copy will duplicate those references no matter the underlying type – Iain Shelvington Apr 17 '21 at 06:59
  • 1
    I think copy also copies references to "primitive types" (which I'm not sure this concept even exists in Python). Everything is an object and lists contain references to objects. – mkrieger1 Apr 17 '21 at 07:00
  • @IainShelvington Sorry, that was worded poorly. Yes it will copy the list, but everything inside the list that is not a primitive type (int, float, chr, str) will only be referenced – rabl Apr 17 '21 at 07:03
  • 1
    @rabl that's not the case at all, objects that are those "primitive" types will also only be referenced – Iain Shelvington Apr 17 '21 at 07:07
  • @IainShelvington take a look at this code; I just tested it with Python3 and the assertion failed, which means it is not passed by reference for type int: `import copy; list = [2, 3]; reference = copy.copy(list); list[1] = 6; assert(reference[1] == list[1];` – rabl Apr 17 '21 at 07:16
  • @rabl your code changes the object referenced at `list[1]` of course your assertion is not going to pass. This code shows that a copied list and the original have the same reference to an int: `list = [1000000]; reference = copy.copy(list); assert id(list[0]) == id(reference[0])` – Iain Shelvington Apr 17 '21 at 07:21
  • @IainShelvington If it were passed by reference (by memory adress of the variable) then the content of both lists would change. – rabl Apr 17 '21 at 07:24
  • @rabi No it wouldn't. `list[index] = foo` changes the object referenced at that position in the list – Iain Shelvington Apr 17 '21 at 07:26
  • That is just how the = operator works. If you had an object that overloaded the = operator or you would change the contents of the object using a function, then the object would still be the same for both lists. See this example: https://pastebin.com/ig1npHV9 – rabl Apr 17 '21 at 07:34
  • @rabl you are misunderstanding how Python works. All "variables" (names) are just references to objects, assigning a different object to a name doesn't change the contents of anything it just changes the reference that the name holds. The `id` function returns the id of the object that a name references, try it: `a = 'foo'; print(id(a)); a = 'bar'; print(id(a))`. Or using lists: `l = [None]; s = 'abc'; id(l[0]); id(s); l[0] = s; id(l[0])` – Iain Shelvington Apr 17 '21 at 07:42
  • You are right. I assumed for type int (and other primitives) the \_\_assign__ function would just store the value that is passed to the function, as that would be faster than referencing. Sorry for causing confusion. – rabl Apr 17 '21 at 08:13
  • @rabl no worries :) you're right, it probably would be more efficient, it's one of many reasons that Python is pretty slow! – Iain Shelvington Apr 17 '21 at 08:19