0

I am trying to generate a {single-key : [multi-value]} dictionary in Python from a .txt file.

This is the text file (tab-separated),

A02.835.583.748      A02.880     0.818181818181818
A02.835.583.748      A02.513     0.818181818181818
A02.835.583.748      A01.378.800.750     0.636363636363636
A02.835.583      A02.880     0.863636363636364
A02.835.583      A02.513     0.863636363636364
A02.835.583      A01.378.800.750     0.681818181818182
A01.378.800.750      A02.880     0.727272727272727
A01.378.800.750      A02.513     0.727272727272727
A01.378.800.750      A01.378.800.750     1

For the same, I use "defaultdict()" function, but somehow I am unable to properly generate the dictionary. I am able to generate a dictionary through this, but it is weird. So, I fetch one of the keys from this weird dictionary.

print(anaDict.get('A02.835.583.748'))

Output:

['A02.880=0.818181818181818', [...], ['A02.513=0.818181818181818'], ['A01.378.800.750=0.636363636363636']]

However, the [...] in this dictionary are actually nesting the other values of the same key in an inception kind of way.

The code I write,

anaDict = defaultdict()
anaSet = set()
with open(f, 'r') as anaFile:
    if '148' in f:
        for line in anaFile:
            key = line.split('\t')[0].rstrip()
            conclusionVal = line.split('\t')[1].strip()
            simScore = line.split('\t')[2].strip()
            value = [conclusionVal + "=" + simScore]
            if key not in anaDict:
                print("Here it goes: " , key, value)
                anaDict[key] = value                    
            if key in anaDict:
                print("Different value: ", key, value)
                anaDict[key].append(value)

        print(anaDict.get('A02.835.583.748'))

I expected the code to generate following dictionary (shown as key-value pairs).

A02.835.583.748 : [A02.880 = 0.818181818181818 , A02.513 = 0.818181818181818,  A01.378.800.750 = 0.636363636363636]
A02.835.583 : [A02.880 = 0.863636363636364, A02.513 = 0.863636363636364, A01.378.800.750 = 0.681818181818182]
A01.378.800.750 : [A02.880 = 0.727272727272727, A02.513  = 0.727272727272727, A01.378.800.750 = 1]

I cannot figure out what it is that I am doing wrong. I will be helpful for any help or direction.

PinkBanter
  • 1,686
  • 5
  • 17
  • 38

2 Answers2

1

This line here is your problem:

anaDict[key].append(value)

When you use list#append, you're putting the argument, in its entirety into the list. But since you're passing in a list, you're putting a list in the list. What you've described wanting to do is to append all of the values from the argument list, not the list itself. All you have to do is replace append with extend.

anaDict[key].extend(value)

Then you're telling the interpreter to unpack the argument list and append each of the values.

mypetlion
  • 2,415
  • 5
  • 18
  • 22
0

The modified code according to the suggestion from @mypetlion (that works) is updated under.

anaDict = defaultdict()
anaSet = set()
#print(f)
with open(f, 'r') as anaFile:
    if '148' in f:
        for line in anaFile:
            key = line.split('\t')[0].rstrip()
            conclusionVal = line.split('\t')[1].strip()
            simScore = line.split('\t')[2].strip()
            value = [conclusionVal + "=" + simScore]
            if key not in anaDict:
                anaDict[key] = value
            if key in anaDict:
                anaDict[key].extend(value)
PinkBanter
  • 1,686
  • 5
  • 17
  • 38