0

I have a problem with pandas DataFrame - I don't understand how I can create new rows and merge them with a dictionary.

My theoretical situation, that I have daily reports from salesmen, who check whether Network Name have particular Category Name on their shelfs. All salesmen simply fill the report and indicate their visit as VisitorID and if a Network Name don't have any Category Name they just don't write anything.

I need to create a new report, where missing Category Name will be filled with zeros.

For instanse, I have next report:

shops = [{'VisitorID': '1434', 'Network Name': 'Sela', 'Category Name': 'Fruits', 'Value': 10},
     {'VisitorID': '1434', 'Network Name': 'Sela', 'Category Name': 'Tobacco', 'Value': 20},
     {'VisitorID': '1301', 'Network Name': 'Sela', 'Category Name': 'Fruits', 'Value': 30},
     {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Fruits', 'Value': 40},
     {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Tobacco', 'Value': 50},
     {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Vegetables', 'Value': 60},]

Initial

At the same time, I have a dictionary dataframe with Network Name and Category Name connection. Every Network Name must have next Category Name on their shelfs:

net_cat = [{'Network Name': 'Sela', 'Category Name': 'Fruits'},
{'Network Name': 'Sela', 'Category Name': 'Tobacco'},
{'Network Name': 'Sela', 'Category Name': 'Cosmetics'},
{'Network Name': 'Mega', 'Category Name': 'Paper'},
{'Network Name': 'Mega', 'Category Name': 'Vegetables'},
{'Network Name': 'Mega', 'Category Name': 'Tobacco'},
{'Network Name': 'Mega', 'Category Name': 'Fruits'}]

Distionary

So, in result I want to have next dataframe:

output = [{'VisitorID': '1434', 'Network Name': 'Sela', 'Category Name': 'Fruits', 'Value': 10},
         {'VisitorID': '1434', 'Network Name': 'Sela', 'Category Name': 'Tobacco', 'Value': 20},
         {'VisitorID': '1434', 'Network Name': 'Sela', 'Category Name': 'Cosmetics', 'Value': 0},
         {'VisitorID': '1301', 'Network Name': 'Sela', 'Category Name': 'Fruits', 'Value': 30},
         {'VisitorID': '1301', 'Network Name': 'Sela', 'Category Name': 'Tobacco', 'Value': 0},
         {'VisitorID': '1301', 'Network Name': 'Sela', 'Category Name': 'Paper', 'Value': 0},
         {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Fruits', 'Value': 40},
         {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Tobacco', 'Value': 50},
         {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Vegetables', 'Value': 60},
         {'VisitorID': '4012', 'Network Name': 'Mega', 'Category Name': 'Fruits', 'Value': 0},]

Result

Thanks!

  • 1
    Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Henry Yik Nov 17 '20 at 16:22

1 Answers1

1

A little bit hard than what I thought , we need first reindex after reshaoe with unstack , then find the correct one to fillna with 0 , since you do not want create additional row if all value are NaN

out = shops.set_index(['VisitorID','Network Name' ,'Category Name']).unstack('VisitorID').reindex(pd.MultiIndex.from_frame(net_cat))
out = out.mask(out.groupby(level=0).apply(lambda x : x.ffill().bfill()).notna()&out.isna(),0).stack().reset_index()
Out[113]: 
  Network Name Category Name VisitorID  Value
0         Sela        Fruits      1301   30.0
1         Sela        Fruits      1434   10.0
2         Sela       Tobacco      1301    0.0
3         Sela       Tobacco      1434   20.0
4         Sela     Cosmetics      1301    0.0
5         Sela     Cosmetics      1434    0.0
6         Mega         Paper      4012    0.0
7         Mega    Vegetables      4012   60.0
8         Mega       Tobacco      4012   50.0
9         Mega        Fruits      4012   40.0
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks! I don't fully comprehend how it actually works, but it's awesome! Will try to understand each step, thanks – Vemen Busat Nov 17 '20 at 18:14