0

I have 2 dataframes. I currently have the code below but I'm now sure how to search for the customer ID in dataframe 2 to place the quantity in the corresponding product column.

Dataframe 1 (INPUT DF): 
CustomerID Product Quantity
123        Ball    2
123        Leash   2
456        Ball    1

Dataframe 2 (OUTPUT DF):
CustomerID Ball Leash
123        0    0
456        0    0     

I want them to look like this

Dataframe 1 (INPUT):
CustomerID Product Quantity
123        Ball    2
123        Leash   2
456        Ball    1

Dataframe 2 (OUTPUT):
CustomerID Ball Leash
123        2    2
456        1    0

Please let me know if I need to explain further.

### Adding how many for each customer
for index, row in df2.iterrows():
   ID = row["Customer_ID"]
   Product = row["Product_Name"]
   Quantity = row["Quantity"]
   df.loc[df.index[df[ID]], Product] = Quantity

2 Answers2

2

If I understood correctly you want to input the counts of each product per user from df1, I would generate this from the df1 itself:

  1. create one column per product, with the respective quantity
  2. groupby customer_id and sum all the Product columns
import pandas as pd
from io import StringIO

df1= StringIO("""CustomerID;Product;Quantity
123;Ball;2
123;Leash;2
456;Ball;1""")

df1=pd.read_csv(df1,sep=";")
unique_columns = list(df1["Product"].unique())

def productsAsColumns(row):
    columns = {c:0 for c in unique_columns}
    columns[row["Product"]] = row["Quantity"]
    return columns 

df1[unique_columns] = df1.apply(productsAsColumns, axis=1,result_type="expand")
df1.drop(columns=["Product","Quantity"],inplace=True)
df1 = df1.groupby("CustomerID").apply(sum)[unique_columns].reset_index() 
print(df1)

OUTPUT

   CustomerID  Ball  Leash
0         123     2      2
1         456     1      0
0

I think you're looking for df.pivot. From the pandas docs:

Reshape data (produce a “pivot” table) based on column values. Uses unique values from specified index / columns to form axes of the resulting DataFrame. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. See the User Guide for more on reshaping.

Using pivot in your case results in CustomerID in the index and Product in the columns:

In [4]: df.pivot('CustomerID', 'Product', 'Quantity')                                                                                                                                                     
Out[4]: 
Product     Ball  Leash
CustomerID             
123          2.0    2.0
456          1.0    NaN

You can then use fillna(0) to get 0 in the remaining cell.

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54