1

After having read the following question and various answers ("Least Astonishment" and the Mutable Default Argument), as well as the official documentation (https://docs.python.org/3/tutorial/controlflow.html#default-argument-values), I've written my ResultsClass so that each instance of it has a separate list without affecting the defaults (at least, this is what should be happening from my new-gained understanding):

class ResultsClass:
    def __init__(self, 
    project = None, 
    badpolicynames = None, 
    nonconformpolicydisks = None, 
    diskswithoutpolicies = None, 
    dailydifferences = None, 
    weeklydifferences = None):
        self.project = project
        if badpolicynames is None:
            self.badpolicynames = []
        if nonconformpolicydisks is None:
            self.nonconformpolicydisks = []
        if diskswithoutpolicies is None:
            self.diskswithoutpolicies = []
        if dailydifferences is None:
            self.dailydifferences = []
        if weeklydifferences is None:
            self.weeklydifferences = []

By itself, this works as expected:

i = 0
for result in results:
    result.diskswithoutpolicies.append("count is " + str(i))
    print(result.diskswithoutpolicies)
    i = i+1
['count is 0']
['count is 1']
['count is 2']
['count is 3']
etc.

The context of this script is that I'm trying to obtain information from each project within our Google Cloud infrastructure; predominantly in this instance, a list of disks with a snapshot schedule associated with them, a list of the scheduled snapshots of each disk within the last 24 hours, those that have bad schedule names that do not fit our naming convention, and the disks that do not have any snapshot schedules associated with them at all.

Within the full script, I use this exact same ResultsClass; yet when used within multiple for loops, the append again seems to be adding to the default values, and in all honesty I don't understand why. The shortened version of the code is as follows:

# Code to obtain a list of projects
results = [ResultsClass() for i in range((len(projects)))]
for result in results:
    for project in projects:
        result.project = project
        # Code to obtain each zone in the project
        for zone in zones:
            # Code to get each disk in zone
            for disk in disks:
                resourcepolicy = disk.get('resourcePolicies')
                    if resourcepolicy:
                        # Code to action if a resource policy exists 
                        else:
                            result.badpolicynames.append(resourcepolicy[0].split('/')[-1])
                            result.nonconformpolicydisks.append(disk['id'])
                    else:
                        result.diskswithoutpolicies.append(disk['id'])
        pprint(vars(result))

This then comes back with the results:

{'badpolicynames': [],
 'dailydifferences': None,
 'diskswithoutpolicies': ['**1098762112354315432**'],
 'nonconformpolicydisks': [],
 'project': '**project0**',
 'weeklydifferences': None}
{'badpolicynames': [],
 'dailydifferences': None,
 'diskswithoutpolicies': ['**1098762112354315432**'],
['**1031876156872354739**'],
 'nonconformpolicydisks': [],
 'project': '**project1**',
 'weeklydifferences': None}

Does a for loop (or multiple for loops) somehow negate the separate lists created within the ResultsClass? I need to understand why this is happening within Python and then how I can correct it.

Dustin Ingram
  • 20,502
  • 7
  • 59
  • 82
RobTheRobot16
  • 323
  • 4
  • 24

2 Answers2

2

Based on my best understanding, one of the glaring problem is you're nesting both the results and projects loop together, whereas you should be looping only either of those. I'd suggest looping the projects and creating a result in each instead of instantiating the classes in a list before.

results = []
for project in projects:
    result = ResultsClass(project)
    # Code to obtain each zone in the project
    for zone in zones:
        # Code to get each disk in zone
        for disk in disks:
            resourcepolicy = disk.get('resourcePolicies')
                if resourcepolicy:
                    # Code to action if a resource policy exists 
                    else:
                        result.badpolicynames.append(resourcepolicy[0].split('/')[-1])
                        result.nonconformpolicydisks.append(disk['id'])
                else:
                    result.diskswithoutpolicies.append(disk['id'])
    results.append(result)        
    pprint(vars(result))

With that, results is the list of your ResultsClass, and each result contain only one project, whereas your previous attempt would end with each ResultsClass with the same, last project.

r.ook
  • 13,466
  • 2
  • 22
  • 39
  • Ah! Thank you very much! I've been scratching my head over this for hours trying to work out what on earth I have been doing wrong. You've hit the nail on the head and have explained it incredibly well. Thanks again! – RobTheRobot16 Dec 13 '19 at 15:11
1

I'm not sure if i get what you are trying to achieve correctly, but you are trying to transfer data from each project to a single result, right?

if so, you might want to use zip to have a single result per project:

for result, project in zip(results, projects):
   # rest of the code

otherwise you are overriding the result for each previous project in the next loop iteration.

Another option would be to create the result in the loop:

results = []
for project in projects:
    result = ResultsClass()

    # ... your fetching code ...

    results.append(result)
Mr.Manhattan
  • 5,315
  • 3
  • 22
  • 34
  • Thanks very much! Both answers essentially got me to the right answer in the end - I needed results to be an empty list, and then have only one for loop rather than two! I was over-complicating matters :) – RobTheRobot16 Dec 13 '19 at 15:12