0

Running this in Jupyter (iPython), I have a dictionary where the key is a compound and the value is an instance of the Model class. I'm iterating through my list of compounds and trying to add all of the individual models to the model list attribute of the Model class instance for that compound.

It's working perfect as I'm summing up if the model is good or not D_compound_Models[cpd].summation += accuracy but when I D_compound_Models[cpd].models.append(model) it looks like it is appending ALL of the models cumulatively for all the compounds in each instance.

When I get the len(D_compound_Model[cpd].models) it's way more than len(lpo) which should be 780. I figured out how to fix it by adding to a list outside the lpo for-loop and then adding the attribute to the class at the end but that's not the way I want to do it.

Why is the list appending operation not working as it should?

Especially since the cumulative sum is working correctly...

Here's my class

class Models:
    def __init__(self,compound=None,models=[],summation=0.0,duration=0.0):
        self.compound = compound; self.models = models; self.summation = summation; self.duration = duration
    def score(self):
        return(self.summation/len(self.models))   

Here's where I'm adding the instances of the class to their corresponding compounds

from sklearn.cross_validation import LeavePOut

D_compound_Models = {}
lpo = LeavePOut(len(DF_attributes.index), p=2) #All combinations of 40 index values while leaving out 2
query_compounds = ["AG1024","AG1478"]

#Check order of indices
for cpd in query_compounds:
    #Create compound instance
    D_compound_Models[cpd] = Models(compound=cpd)
    #Get sensitivity column for compound
    SR_compound = DF_compoundSensitivity[cpd]
    #Create and train models
    for index_values in lpo: #There should be 780 of these

        #Create model
        model = #Some model object

        #a bunch of model training stuff that's irrelevant       

        accuracy = #1 or 0 

        #Store models
        D_compound_Models[cpd].models.append(model)
        D_compound_Models[cpd].summation += accuracy

This is how I'm checking it at the end

for cpd in query_compounds:
    M = D_compound_Models[cpd]

    print(cpd,M.summation,len(M.models),M.score())

This is the correct answer doing it the way I don't want to do it (adding list at end)

#Correct: Note how 630 and 528 are not == 780. I do some filtering so it's alright
('AG1024', 304.0, 630, 0.48253968253968255)
('AG1478', 221.0, 528, 0.4185606060606061)

This is the the incorrect answer doing it using the code from above (appending to list)

#Incorrect
('AG1024', 304.0, 1158, 0.26252158894645944)
('AG1478', 221.0, 1158, 0.19084628670120898)

I'm making sure to reset the class everytime I run in Jupyter. I've even restarted the kernel and got the same results...

O.rka
  • 29,847
  • 68
  • 194
  • 309
  • Can you generalize your question? Surely you don't need us to understand all 3 conditions, for example. – Eli Dinkelspiel Oct 18 '15 at 01:11
  • 1
    yea sorry, the conditions are irrelevant for the example. i'll fix it right now. – O.rka Oct 18 '15 at 01:11
  • 1
    Thanks. I don't know your lib but I can take a look. If anything, it should make it easier for someone smarter than me to answer :) – Eli Dinkelspiel Oct 18 '15 at 01:12
  • I fixed it! lol, yea that condition was extra confusion if you don't know the data my bad. I've just started doing machine learning and using classes so my apologies if my syntax is unorthodox. – O.rka Oct 18 '15 at 01:15
  • 1
    No problem. I don't know jack about machine learning but I DO know about classes. – Eli Dinkelspiel Oct 18 '15 at 01:15
  • Dude, it's still a clusterfudge. You need to REALLY generalize your code. As in, rip out all the machine learning stuff unless it's part of your problem. – Eli Dinkelspiel Oct 18 '15 at 01:21
  • I haven't read the whole post, but I saw that you have problems when appending to a list and I see that you have a mutable default argument. . . That makes me think of http://stackoverflow.com/questions/1132941/least-astonishment-in-python-the-mutable-default-argument -- Which would make a good read even if it isn't related to your current problem (but I _really_ think it is)... – mgilson Oct 18 '15 at 01:33
  • 1
    @eli_dink I think I generalized it as much as I could. I deleted all the messy stuff. the most important part of the problem is in the last few lines of the main block of code where i'm appending to the list instance – O.rka Oct 18 '15 at 01:35
  • I'm a little lost so I'm just going to suggest things that will help your code in general, and then maybe we'll randomly hit on the right answer. Ok, the first thing I can think of is to try setting `M = Models(compound=cpd)` right after the `for` loop and worry about adding it to the dictionary after you've done all your desired operations. That should clean up your code. Also, have you considered writing some class methods? That will help readability at least. Also, your variable names suck. – Eli Dinkelspiel Oct 18 '15 at 01:42
  • if it starts with a DF_ its a dataframe. if it starts with a D_ it's a dictionary. if it's upper case it's a class. if something is plural (has an s at the end) it's an iterable. lpo stands for leave-pair-out. – O.rka Oct 18 '15 at 02:31

0 Answers0