Suppose I have this data frame df
df <- data.frame("City" = c("Boston", "Boston","Boston", "Boston","Boston", "Boston",
"Boston", "Boston", "Boston", "New York", "New York", "New York" ),
"Store_ID" = c("00002", "00002", "00002", "00002", "00004", "00004",
"00004", "00004", "00004", "00011", "00011", "00011"),
"Customer_ID" = c("10001", "10001", "10001", "23847", "17823", "17823",
"17823", "17823", "17823", "24232", "24232", "27381"),
"Product_ID" = c ("00013", "00013", "00058", "00013", "00899", "00847",
"00065", "00065", "00065", "00096", "00085", "00175"),
"Payment" = c("Cash", "Cash", "Cash", "Card", "Card", "Card", "Card",
"Card", "Card", "Card", "Card", "Cash"))
Let's say I want to know how many products were sold in each city; then I'd use this code
df2 <- df %>% group_by(City) %>% summarise(Quantity = (n))
Or if I want to know the quantity of products sold in each store I can expand the previous code, such as:
df2 <- df %>% group_by(City, Store_ID) %>% summarise(Quantity = (n))
However, this further splits the data frame and now I cannot see total number of products sold in each city. Is it possible to create a new data frame that contains counts of different groups but is only grouped by a more encompassing variable such as City or Store.
An example output that I'm looking for Store00002 only would be like this:
Store Total_Sales Customer10001_purchases Customer23847_purchases Cash% (ratio of items paid in cash)
00002 4 3 1 0.75
Is it possible to do this through dplyr? I'm also open to any other suggestions. Really appreciate the assistance!