0

I have two tables-Price_list and order_list. The price_list table gives me all the prices that were active with date from all stores by product_id. While order_list gives me the list of orders placed i.e. who placed the order and from which store.

Price_list - date, product_id, store_id, selling_price

order_list - date, product_id, store_id, selling_price, order_id, email, product_order_id (unique key - concatenation of product_id and order_id as there could more than one product in an order)

I want to combine the above two tables in such a way that for each product_order_id i get a list of all prices that were available for the product. Basically i want to see what were the prices available and what did the customer choose. The table below illustrates my query.

|product_order_id    Date     product_id    store_id   selling_price  Placed|
|134323_3545       2016/03/11    134323         6433       2560.00     Yes |
|134323_3545       2016/03/11    134323         6343       2534.00     No  |
|134323_3545       2016/03/11    134323         1243       2313.00     No  |
|134323_3545       2016/03/11    134323         2424       2354.00     No  |
|145565_9965       2016/03/11    145565         9887       5432.00     No  |
|145565_9965       2016/03/11    145565         7645       5321.00     Yes |

I am not able to get around to solving this in R. Although i prefer R for this, i am open if there is a solution in mysql or python. The steps to get this done is (a) select product_order_id (B) on that date for each product_id in the product_order_id search for all entries in price_list (C) append this to a table and add a column specifying product_order_id this list applies to (d) repeat the steps for the next product_order_id. Once the dataframe is prepared i can left join order_list table on column(product_order_id) to get the final dataframe. I have not yet been able to grasp how to do it in R.


After reading about loops and some help i was able to create a loop for searching all price entries for each product_id on a day (product_date is a concatenation of date and product_id):

datalist <- list()
for(i in (orderlisit_test$product_date){
  dat <- filter(pricelist, pricelist$product_date==i)
  datalist[[i]] <- dat
}

big_data = do.call("rbind", datalist)

However, i also want to add another column specifying the order_id or product_order_id for each iteration. So if anyone could help me in how should i loop as well as add another column at the same time that will help me a lot.

Community
  • 1
  • 1
Arshad Islam
  • 107
  • 1
  • 1
  • 10

1 Answers1

0

This will retain all the rows for every product_id

library(dplyr)
order_list_joined<-full_join(Price_list,order_list,by="product_id")

Then if there is no order_id for a given product_id, we assume there is no order place.

order_list_joined<-order_list_joined %>% mutate(Placed = ifelse(is.na(order_id),"No","Yes")
adaien
  • 1,932
  • 1
  • 12
  • 26
  • I dont think left join will work. this will only give me the details of the stores from which the order was placed. Apart from the list of stores orders were successfully plaed, I also want a list of stores from which order could have been placed for each order_id. – Arshad Islam Mar 16 '16 at 11:42
  • Sorry if I misunderstood, do you want for every product_order_id the list of prices present in the other table, where the the element which join the two tables is the product_id? – adaien Mar 16 '16 at 11:46
  • yep...exactly...what to know all the possibilities. Also the prices change daily, so its product_id and "date" combined. – Arshad Islam Mar 16 '16 at 11:53
  • The edit should to the trick. If not, please post some sample rows from both your databases to let me understand better – adaien Mar 16 '16 at 11:57
  • outer join will work if there was one order for each product...but here we may have multiple order from different users for the same product. The data i have shared is akin to the original data. I have also made a few edits for better understanding. Thanks for the help. – Arshad Islam Mar 16 '16 at 12:17