This is a weird one: I have 3 dataframes, "prov_data" with contains a provider id and counts on regions and categories (ie. how many times that provider interacted with those regions and categories).
prov_data = DataFrame({'aprov_id':[1122,3344,5566,7788],'prov_region_1':[0,0,4,0],'prov_region_2':[2,0,0,0],
'prov_region_3':[0,1,0,1],'prov_cat_1':[0,2,0,0],'prov_cat_2':[1,0,3,0],'prov_cat_3':[0,0,0,4],
'prov_cat_4':[0,3,0,0]})
"tender_data" which contains the same but for tenders.
tender_data = DataFrame({'atender_id':['AA12','BB33','CC45'],
'ten_region_1':[0,0,1,],'ten_region_2':[0,1,0],
'ten_region_3':[1,1,0],'ten_cat_1':[1,0,0],
'ten_cat_2':[0,1,0],'ten_cat_3':[0,1,0],
'ten_cat_4':[0,0,1]})
And finally a "no_match" DF wich contains forbidden matches between provider and tender.
no_match = DataFrame({ 'prov_id':[1122,3344,5566],
'tender_id':['AA12','BB33','CC45']})
I need to do the following: create a new df that will append the rows of the prov_data & tender_data DataFrames if they (1) match one or more categories (ie the same category is > 0) AND (2) match one or more regions AND (3) are not on the no_match list.
So that would give me this DF:
df = DataFrame({'aprov_id':[1122,3344,7788],'prov_region_1':[0,0,0],'prov_region_2':[2,0,0],
'prov_region_3':[0,1,1],'prov_cat_1':[0,2,0],'prov_cat_2':[1,0,0],'prov_cat_3':[0,0,4],
'prov_cat_4':[0,3,0], 'atender_id':['BB33','AA12','BB33'],
'ten_region_1':[0,0,0],'ten_region_2':[1,0,1],
'ten_region_3':[1,1,1],'ten_cat_1':[0,1,0],
'ten_cat_2':[1,0,1],'ten_cat_3':[1,0,1],
'ten_cat_4':[0,0,0]})