0

I'm trying to create a pandas data frame from a list of objects I've already created. Each entry is added one at a time. In my function, before entering the new object, I want to check whether that value already exists. However when my Data Frame is entry, and I check for that object, it tells me it exists and ignores it.

So I create an empty data frame:

clienttable=pd.DataFrame(columns=['ClientNo',
                                               'Last Name',
                                               'Full Name',
                                               'DateofBirth'])

I then have 2 entries:

client1={'ClientNo':0,'Last Name':'Doe', 'Full Name':'John Doe',
                                   'DateofBirth':12-12-1970}

client2={'ClientNo':1,'Last Name':'Mad', 'Full Name':'Jim Mad',
                                   'DateofBirth':12-1-1983}

when I check my clienttable before adding it shows as an empty DataFrame:

print(clienttable)    

Empty DataFrame
    Columns: [ClientNo, Last Name, Full Name, DateofBirth]
    Index: []

In order to check if the entry already exists I use:

if clienttable['ClientNo'].any() == Client1['ClientNo']:
    print("ClientNo is already captured in the Table")
    
else:
     entryadd=pd.DataFrame(data={'ClientNo':[Client1['ClientNo'],
                                        'Last Name':[Client1['Last Name']],
                                        'Full Name':[Client1['Full Name']],
                                   'DateofBirth':[Client1['DateofBirth']])

For Client2, where ClientNo=2, then this works without issue. It correctly identifies the first time that is is not there and adds it, and then if I try and add it again it tells me it's already captured.

However for Client1, where ClientNo=1, clienttable['ClientNo'].any() == Client1['ClientNo'] returns True when this is the first entry in the df, so it won't add it.

I'm struggling to understand why when the df is empty, df['Series'].any() thinks that an entry with value of 0 is present?

Amir Afianian
  • 2,679
  • 4
  • 22
  • 46

2 Answers2

0

.any() returns True if anything is present in the series. But you want to check for a particular value. Instead you should to:

if Client1['ClientNo'] in clienttable['ClientNo']:

This will check correctly.

YOLO
  • 20,181
  • 5
  • 20
  • 40
0

Another solution could be to use is instead of == when you compare false and 0 which is probably what you were trying to do. This is because False is actually equal to 0. Please see Differentiate False and 0

Your working code:

if clienttable['ClientNo'].any() is client1['ClientNo']:
    print("ClientNo is already captured in the Table")
else:
    entryadd = pd.DataFrame(data={
        'ClientNo': [client1['ClientNo']],
        'Last Name': [client1['Last Name']],
        'Full Name': [client1['Full Name']],
        'DateofBirth': [client1['DateofBirth']]
    })
Dharman
  • 30,962
  • 25
  • 85
  • 135
ranaalisaeed
  • 304
  • 3
  • 15