Elegance and Performance: When to use redundant lists for queries?

Question

EDIT: When should and shouldn't you break away from OOP for speed/performance? might be relevant to this question.

I'm sorry if my question is unclear; I'm an amateur hobbyist and, were I better educated, I might know some relevant jargon to be more specific. Allow me to use some simple example code.

class EmployeeRecords(object):
    """A record of all employees."""
    def __init__(self):
        super().__init__()
        self.employees = []
        # The following two attributes are redundant.
        self.at_office = {"LAN":[], "DET":[], "KAL":[]}
        self.in_thirties = []

    def register_employee(self, employee):
        """Register a new employee in the records.

        This entire method is redundant.

        """
        self.employees.append(employee)
        self.at_office[employee.office_code].append(employee)
        if 30 <= employee.age < 40:
            self.in_thirties.append(employee)

class Employee(object):
    """An employee record featuring relevant information for queries."""
    def __init__(self, first_name, last_name, age, office_code):
        super().__init__()
        self.first_name = first_name
        self.last_name = last_name
        self.age = age
        self.office_code = office_code


# Instantiation and what-not goes here.
...

print([x for x in my_records.employees if 30 <= x.age < 40])
# VS
print(my_records.in_thirties)

Which is more appropriate? Is the latter method generally considered to be bad form by the experts as SO?

--More Info--

It seems like it might be computationally more effective to just add Employee instances to relevant lists upon registration in the EmployeeRecords. However, I've been recently studying SQL (finally), and it seems that a big part of using it effectively is 'data normalization': removing redundant data from multiple tables that could be otherwise attained by a deeper query.

I can see how and agree that having redundant data can invite bugs; why bother keeping all of these redundant lists updated when my queries can pull from a single list through object (or, in SQL's case, table) association? In the above example, the list comprehension will always return the correct information, but using my_records.in_thirties would yield unexpected results if I foolishly appended my_records.employees instead of using my_records.register_employee.

This is just an example where, in terms of code management and performance, there's little difference between either method. In practice, though, a query could involve searching through lists within lists of objects whose attributes are lists that need to be queried for other objects.

Is it considered to be good practice to avoid redundant lists for this purpose, or would it be considered by most to be favorable to repeatedly doing very deep searches? I understand that Python is not SQL, but I think that OOP is very much about the relationships between objects through the use of attributes, and so I can see how these sorts of lists would be considered bad form and prone to bugs.

Thanks for your help. I have no formal education and, although I have years of programming experience with pet projects, I'm always learning new things in the way of effective architecture. This is my first post on SO after years of browsing it, so please be gentle if this is a dumb or inappropriate question. I don't know where else to turn!

-David Hernandez

This is really a programming design question more than a Python one. This kind of simple caching can be done in any language. — Karl Knechtel, Mar 27 '12 at 22:02

score 1 · Accepted Answer · answered Mar 27 '12 at 21:24

Basically it depends on how often you need a particular thing versus how often you need what it's based on.

If the only query you ever make about Employees is which ones are in their thirties, and you're running into performance issues making that query, then it makes sense to compute it ahead of time.

If, on the other hand, that's only one of many queries you're making, it makes less sense to clutter your data models with tons of precomputed things; keeping the models simple and computing+caching what you need when you need it will make your code much easier to work with.

Only optimize for performance what you need to optimize for performance, if doing so would come at the cost of maintainability/coding time. (See http://c2.com/cgi/wiki?PrematureOptimization.)

This is what I needed to hear, and, right after posting, I got a bit of enlightenment from the SO question that I posted in my edit. I've heard all about the pitfalls of premature optimization, but I've never written a program large enough that it's become much of an issue. In this case, you're right; I'll keep doing the deep queries until it becomes clear that I ~need~ to use redundant lists. Thank you very much! — vencabot_teppoo, Mar 27 '12 at 21:32

Elegance and Performance: When to use redundant lists for queries?

1 Answers1