1

I have the following nested dictionary:

table_dict = {
    "ints": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "strings": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "floats": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 0,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    }
  }

When I run this line table_dict["ints"]["range metrics"]["mean"] = 11 it changes all the "mean" values instead of just the mean in the "ints" dict. Here is what my dict looks like after that line

table_dict = {
    "ints": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "strings": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    },
    "floats": {
      "domain highlights": {
        "rows": 3000000,
        "nulls": 5
      },
      "range metrics": {
        "mean": 11,
        "maximum": 0,
        "minimum": 0
      },
      "focus values": {
        "1": 0,
        "sample_text": 0
      }
    }
  }

How do I only change one value instead of over-writing all of them. Is there a separate way to change the values of dictionaries that I need to use?

For the person asking how I first created table_dict:

focus_values = [1, "sample_text"]
table_name = ""
setup = True
col_dict = {"domain highlights": {"rows": 0, "nulls": 0},
            "range metrics": {"mean": 0, "maximum": 0, "minimum": 0},
            "focus values": {}}
for i in focus_values:
    col_dict["focus values"][i] = 0

file = "large_sample_file.csv"
file_df = pd.read_csv(file, chunksize=10000)
for chunk in file_df:
    if setup:
        setup = False
        table_name = chunk.iloc[1, 0] # table name is in column 1
        table_dict = {}
        for i in chunk.columns[1:]:
            table_dict[i] = col_dict
    profile_table(chunk, table_dict)
  • 2
    Sounds like you re-used a single reference when constructing the dict. You need to create a new `"range metrics"` dict for each value you insert. – flakes Jul 01 '21 at 22:04
  • 1
    Can you show us how you initially constructed `table_dict` ? – flakes Jul 01 '21 at 22:06
  • I think I see my error, I need to create a copy of col_dict in the for loop. – Daniel Young Jul 01 '21 at 22:12
  • I recently asked a similar question to this https://stackoverflow.com/questions/67429227/preventing-reference-re-use-during-deepcopy – flakes Jul 01 '21 at 22:18

1 Answers1

1

From my comment, it sounds like you re-used a single reference when constructing the dict. You need to create a new "range metrics" dict for each value you insert. Here's an example of the pitfall.

initial_data = {
    "a": 0,
    "b": 1,
}

data = {
    "c": initial_data,
    "d": initial_data,
}

print(data)
data["c"]["a"] = 2
print(data)
{'c': {'a': 0, 'b': 1}, 'd': {'a': 0, 'b': 1}}
{'c': {'a': 2, 'b': 1}, 'd': {'a': 2, 'b': 1}}

You notice from this code that both key c and d have their subkey a updated. That is because both c and d point to the same reference. i.e. id(data["c"]) == id(data["d"])

Instead what you need to do is create a new dict for each value. I would recommend creating a helper method for this:

def initial_data(): 
    return {
        "a": 0,
        "b": 1,
    }

data = {
    "c": initial_data(),
    "d": initial_data(),
}

print(data)
data["c"]["a"] = 2
print(data)

Now you will get your expected results:

{'c': {'a': 0, 'b': 1}, 'd': {'a': 0, 'b': 1}}
{'c': {'a': 2, 'b': 1}, 'd': {'a': 0, 'b': 1}}
flakes
  • 21,558
  • 8
  • 41
  • 88
  • 1
    Thank you, this works perfectly. before you posted I was trying to use .copy() and dict(), but those did not fix the problem. – Daniel Young Jul 01 '21 at 22:23