Can't check if there is an object in a list before adding the same one during a loop

Question

I extract a data from the file doctor_data.txt, create an object from each line whether they belong to Private or State subclasses of parent class Doctor (If the line has four floats including bonus payment Private, if not then State). After creating classes, I write a program that adds each Private or State doctor object to a list doctorList = [] unless the object created from the data line already exists in the list.

My loop adds the each object to the list without any problem, but can't check if the object already exists. How can ı fix this issue?

The loop part at the end:

doctor_data = open('/Users/berketurer/Desktop/doctor_data.txt', 'r')
doctorList = []

for line in doctor_data:
    data = line.split(";")


    if len(data) == 4:
        dname, title, patients, treatmentFee = data
        patients, treatmentFee = float(patients), float(treatmentFee)
        doc_obj = Private(dname, title, patients, treatmentFee)

    elif len(data) == 3:
        dname, title, salary = data
        salary = float(salary)
        doc_obj = State(dname, title, salary)

    if doc_obj not in doctorList: #no idea about why it doesn't understand Alyysa occurs twice.
        doctorList.append(doc_obj)

The output:

runfile('/Users/berketurer/Desktop/Lab__Berke_Turer.py', wdir='/Users/berketurer/Desktop')
[Assistant Professor - Alonzo Ballard Payment: 9250.0
, Assistant Professor - Tracey Russell Payment: 90000.0
, Associate Professor - Andrea Howard Payment: 20000.0
, Associate Professor - Rosalie West Payment: 35000.0
, Associate Professor - Sue Beck Payment: 139500.0
, Professor - Alyssa Padilla Payment: 150000.0 #twice
, Professor - Alyssa Padilla Payment: 150000.0 #twice
, Professor - Darryl Walker Payment: 100000.0
, Professor - Jeremiah Bailey Payment: 24750.0
, Specialist - Andrew Austin Payment: 12750.0
, Specialist - Lyle Romero Payment: 10250.0
]

The complete code:

class Doctor:

    def __init__(self, dname, title):
        self.__dname = dname
        self.__title = title

    def get_dname(self):
        return self.__dname
    def get_title(self):
        return self.__title

    def set_dname(self, newname):
        self.__dname = newname
    def set_title(self, newtitle):
        self.__title = newtitle

    def __eq__(self, other):
        if self.__dname == self.__title and other.__dname and other.__title:
            return True

    def __lt__(self, other):
        if self.__title < other.__title:
            return True
        elif self.__title == other.__title:
            if self.__dname < other.__dname:
                return True
        else:
            return False

    def __repr__(self):
        return "{} - {}".format(self.__title, self.__dname)


class Private(Doctor):

    def __init__(self, dname, title, patients, treatmentFee):
        Doctor.__init__(self, dname, title)
        self.__patients = patients
        self.__treatmentFee = treatmentFee

    def calculate_payment(self):
        self.__payment = self.__patients * self.__treatmentFee
        return self.__payment


    def __repr__(self):
        return "{} - {} Payment: {}\n".format(self.get_title(), self.get_dname(), self.calculate_payment())


class State(Doctor):

    def __init__(self, dname, title, salary):
        Doctor.__init__(self, dname, title)
        self.__salary = salary
        self.__baseBonus = 5000
        self.__payment = 0        

    def calculate_payment(self):

        if self.get_title() == "Professor" :
            self.__payment = self.__salary + self.__baseBonus * 1.25
            return self.__payment

        elif self.get_title() == "Associate Professor" : 
            self.__payment = self.__salary + self.__baseBonus
            return self.__payment

        elif self.get_title() == "Assistant Professor" or self.get_title() == "Specialist":
            self.__payment = self.__salary + self.__baseBonus * 0.75
            return self.__payment

    def __repr__(self):
        return "{} - {} Payment: {}\n".format(self.get_title(), self.get_dname(), self.calculate_payment())

doctor_data = open('/Users/berketurer/Desktop/doctor_data.txt', 'r')
doctorList = []

for line in doctor_data:
    data = line.split(";")

    if len(data) == 4:
        dname, title, patients, treatmentFee = data
        patients, treatmentFee = float(patients), float(treatmentFee)
        doc_obj = Private(dname, title, patients, treatmentFee)

    elif len(data) == 3:
        dname, title, salary = data
        salary = float(salary)
        doc_obj = State(dname, title, salary)

    if doc_obj not in doctorList: #no idea about why it doesn't understand Alyysa occurs twice.
        doctorList.append(doc_obj)

doctorList.sort()
print(doctorList)

score 1 · Accepted Answer · answered Apr 25 '20 at 19:18

The issue is that the equality is currently checking whether they are the same object in memory which is false. To overcome this, you need to define a custom equality class in Doctor. Look at the __eq__ method in this documentation or this stackoverflow question.

score 0 · Answer 2 · answered Apr 25 '20 at 19:21

0

The in operator on a list of objects checks for an instance not for identical content. When you create an instance of Private or State it receives a new unique internal identifier which is what the in operator tries to find. So, even if the content of two objects are the same, their identifier will be different.

You would need to either create a comparison function that will take into account every field of each object in the comparison (which could become complex) or you could simply exclude duplicate lines from the input:

For example:

If duplicates can occur anywhere...

seenLine = set()
for line in doctor_data:
   if line in seenLine: continue
   seenLine.add(line)
   # ...

If duplicates are always consecutive...

previousLine = None
for line in doctor_data:
   if line == previousLine: continue
   previousLine = line
   # ...

answered Apr 25 '20 at 19:21

Alain T.

40,517
4
31
51

Duplicates can occur anywhere... What do you mean exactly by seenLine = set() ? – Berke Turer Apr 25 '20 at 20:22
a set() is a kind of list that only keeps one copy of each item. it is also very fast to search so you can use it to keep track of the lines you've already seen and check against it to skip over duplicate lines. – Alain T. Apr 25 '20 at 20:47
The loop doesn't seem to care about the seenLine checker, I tried both with continue and break statements. – Berke Turer Apr 25 '20 at 22:20
It creates a set with proper purpose though, the lines are not duplicating in the set. – Berke Turer Apr 25 '20 at 22:25
The lines probably have a slight different (e.g. an extra space) that make them distinct while the data is actually the same. You could try to build a comparison key after splitting the line on semi colon and check that against seenLine instead of the raw data. for example: `lineKey = ";".join(s.strip() for s in data); if lineKey in seenLine: continue; seenLine.add(lineKey)` – Alain T. Apr 25 '20 at 23:00

Can't check if there is an object in a list before adding the same one during a loop

2 Answers2