0

Potentially unimportant context:

I have a script that is writing strings to a file. These strings start in a master list, get seperated out into separate lists, some of them then go through a web scraper, generating more strings, and then the outputs are written to a fasta file. As such there's quite a lot of index checking to coordinate the outputs between these lists.

To check its all working OK Id like to compare the final fasta file to the initial master lists to check the indexing between the different lists is producing the expected output. To do this, I am writing the fasta file in as a big_list, and am then recreating the initial master list. The question below is part of that effort.

Question:

Im trying to update a dictionary according to the contents of big_list, but my dictionary updating is doing something weird. It is assigning the values OK, but it isn't discriminating between the different keys, so all of the values are assigned as duplicate lists to each key.

I start by turning a list of keys (dict_key_list) from earlier in the script to a dictionary:

dict_key_list=[dict_key1_str, dict_key2_str, ...]
dictionary=dict.fromkeys(set(dict_key_list), []) #--> {dict_key1_str:[], dict_key2_str:[], .........}

I then try and update this according the big_list objects described above. But like I said something strange is going on. Can anyone tell me what, whilst I keep thinking about it?

#big list format = ['dict_key1_str \t value1_str', 'dict_key1_str \t value2_str',......,'dict_key2_str \t value1000_str','dict_key2_str \t value1001_str',... for more keys]
                   
for list_object_str in big_list:
   
    #break into a list:  'dict_key1 \t value1' --> [dict_key1, value1]
    list_object_list=re.split(r'\t+', list_object_str)
    
    #pull key from list above
    dict_key_str=list_object_list[0]
    
    #pull the value I want to append from list above
    value_add_str=list_object_list[1]
    
    #get whatever value is currently assigned to key - this is a list
    current_dict_value=dictionary[dict_key_str]
    
    #add append value to current value
    current_dict_value.append(value_add_str)
    
    #assign new value to dict key
    dictionary[dict_key_str]=current_dict_value

    #have also tried dictionary.update({dict_key:current_dict_value})

This assigns both sets of values to each key - so all values from 1 to 999 and 1000 to the end value index are all assigned as identical big lists to each key:

{'dict_key1_str':[value1_str,value2_str,...last_value_str], 'dict_key2_str':[value1_str,value2_str,...last_value_str], ... more keys same value lists}

What I actually want is:

{'dict_key1_str':[value1_str,value2_str,...value999_str], 'dict_key2_str':[value1000_str,value1001_str,...last_value_key2_str], ... more keys different value lists}
Tim Kirkwood
  • 598
  • 2
  • 7
  • 18

1 Answers1

1

I didn't go through the code but at first glance it seems like something to do with references and immutability. For example,

>>> a = {"foo": "bar"}
>>> b = a
>>> b["foo"] = "baz"
>>> print(a["foo"])
baz

As you can see the assignment did not change the object. Thus every time you call append on your list, it also updates every reference to that list.

Try using dict.update() instead,

my_dict.update({key: my_dict.get(key, []) + value_to_append})
dingobar
  • 48
  • 6
  • Hi Dingobar, thanks for your reply. You were right, based on the link above I was essentially getting the same list object for all my values, so they were all getting referenced when I was trying to alter just the one (from what I understand). Your solution didn't work though (unless I wrote something wrong when i put in my actual variable names). I think this is because it focusses on the referencing within the loop (i.e. the 'current value'), rather than the referencing between the dictionary objects. Although maybe it works fine and I did something wrong like I said :P – Tim Kirkwood Nov 18 '20 at 20:36