1

I'm new to Pandas/Python but am familiar with excel and what I'm trying to accomplish is essentially a v-lookup. I have a list of contract numbers that need to be compared to determine if each item in list A is in list B exactly time, if the item in list be has been matched to an item in list A then it cannot be matched again.

    coll = []
    coll = pd.DataFrame(coll)
    coll = coll.append([cardpay, achpay])


    audit0 = []
    audit0 = coll["Policy #"]
    audit0 = audit0.dropna()
    audit0 = pd.DataFrame(audit0)
    coll = coll.append([saves])
    coll.sort_values("UserName")
    coll = coll.reset_index(drop=True)

    audit1 = []
    audit1 = coll["PolicyNumber"]
    audit1 = audit1.dropna()
    audit1 = pd.DataFrame(audit1)

all of the data required is stored and sorted above

audit0 & 1 are redundant but make it easier to view for me

The count in "PolicyNumber" and "Policy #" do not match which has led to some issues in the

attempted solutions below

solution below leaves "Payment?" empty when prtined

    coll["Payment?"] = np.where((coll["PolicyNumber"].str.contains(str(audit0))), "yes", "no")

Solution below did not work but read this approach was not ideal along my trials

    for i, row in coll["PolicyNumber"]:
        if i == coll.loc[coll["Policy #"]]:
            print(True)
This gave error because the series are not of the same size(shape)
    coll["Payment?"] = coll.merge(coll[["PolicyNumber", "Policy #"]], on=["Policy #"], 
    how="right")

Any advice or maybe additional clarity on related posts that may answer my question already?

quaaalud
  • 13
  • 2

2 Answers2

0

You can use numpy broadcasting:

df1 = pd.DataFrame({
    'Policy #': [1,2,3]
})
df2 = pd.DataFrame({
    'Policy Number': [1, 1, 2, 4, 5]
})

a = df1['Policy #'].to_numpy()[:, None]
b = df2['Policy Number'].to_numpy()
count = (a == b).sum(axis=1)

df1[count == 1]

What is numpy broadcasting?

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Thank you for the suggestion but I still receive the error for the series not being the same length. I believe this is user error because of something else i have done to filter the data down and not your suggestion. – quaaalud Sep 11 '21 at 14:45
  • Did you miss the `[:, None]`? numpy broadcasting is designed for working with arrays of different lengths – Code Different Sep 11 '21 at 14:49
  • That was not the piece I missed originally but I must have messed something else up because re-attempting this solution gave the desired result! Thank you for your help! – quaaalud Sep 11 '21 at 16:07
0
  • convert listA to dictionaryA where dictionary keys are elements and values are counts - linear pass through listA. (see this for example)

  • convert listB to dictionaryB as above - linear pass through listB.

  • now, intersect set(dictionaryA) and set(dictionaryB) - these are common elements. common_elems = set(dictionaryA) & set(dictionaryB)

  • finally,

for elem in common_elems:
    if dictionaryB[elem] == 1: print elem