0

I have a data set that looks like below. The dataset has been taken from Kaggle:

https://www.kaggle.com/aksha17/superstore-sales

  head(supstore_df, 10)
   ï..Row.ID       Order.ID Order.Date Ship.Date      Ship.Mode Customer.ID   Customer.Name
1          1 CA-2016-152156   08-11-16  11-11-16   Second Class    CG-12520     Claire Gute
2          2 CA-2016-152156   08-11-16  11-11-16   Second Class    CG-12520     Claire Gute
3          3 CA-2016-138688   12-06-16  16-06-16   Second Class    DV-13045 Darrin Van Huff
4          4 US-2015-108966   11-10-15  18-10-15 Standard Class    SO-20335  Sean O'Donnell
5          5 US-2015-108966   11-10-15  18-10-15 Standard Class    SO-20335  Sean O'Donnell
6          6 CA-2014-115812   09-06-14  14-06-14 Standard Class    BH-11710 Brosina Hoffman
7          7 CA-2014-115812   09-06-14  14-06-14 Standard Class    BH-11710 Brosina Hoffman
8          8 CA-2014-115812   09-06-14  14-06-14 Standard Class    BH-11710 Brosina Hoffman
9          9 CA-2014-115812   09-06-14  14-06-14 Standard Class    BH-11710 Brosina Hoffman
10        10 CA-2014-115812   09-06-14  14-06-14 Standard Class    BH-11710 Brosina Hoffman
     Segment       Country            City      State Postal.Code Region      Product.ID
1   Consumer United States       Henderson   Kentucky       42420  South FUR-BO-10001798
2   Consumer United States       Henderson   Kentucky       42420  South FUR-CH-10000454
3  Corporate United States     Los Angeles California       90036   West OFF-LA-10000240
4   Consumer United States Fort Lauderdale    Florida       33311  South FUR-TA-10000577
5   Consumer United States Fort Lauderdale    Florida       33311  South OFF-ST-10000760
6   Consumer United States     Los Angeles California       90032   West FUR-FU-10001487
7   Consumer United States     Los Angeles California       90032   West OFF-AR-10002833
8   Consumer United States     Los Angeles California       90032   West TEC-PH-10002275
9   Consumer United States     Los Angeles California       90032   West OFF-BI-10003910
10  Consumer United States     Los Angeles California       90032   West OFF-AP-10002892
          Category Sub.Category
1        Furniture    Bookcases
2        Furniture       Chairs
3  Office Supplies       Labels
4        Furniture       Tables
5  Office Supplies      Storage
6        Furniture  Furnishings
7  Office Supplies          Art
8       Technology       Phones
9  Office Supplies      Binders
10 Office Supplies   Appliances
                                                       Product.Name    Sales Quantity
1                                 Bush Somerset Collection Bookcase 261.9600        2
2       Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back 731.9400        3
3         Self-Adhesive Address Labels for Typewriters by Universal  14.6200        2
4                     Bretford CR4500 Series Slim Rectangular Table 957.5775        5
5                                    Eldon Fold 'N Roll Cart System  22.3680        2
6  Eldon Expressions Wood and Plastic Desk Accessories, Cherry Wood  48.8600        7
7                                                        Newell 322   7.2800        4
8                                    Mitel 5320 IP Phone VoIP phone 907.1520        6
9              DXL Angle-View Binders with Locking Rings by Samsill  18.5040        3
10                                 Belkin F5C206VTEL 6 Outlet Surge 114.9000        5
   Discount    Profit
1      0.00   41.9136
2      0.00  219.5820
3      0.00    6.8714
4      0.45 -383.0310
5      0.20    2.5164
6      0.00   14.1694
7      0.00    1.9656
8      0.20   90.7152
9      0.20    5.7825
10     0.00   34.4700
> 

I would like to combine the rows of the same order in such a way that the Order ID, the customer name, the segment, the category and the sub-category of all the orders that have been split into different rows is combined into one row, and the items ordered are concatenated in that that row, in a single field.

I have tried this:

df_supstore_order_list <- ddply(supstore_dfcopy,c("Customer.Name", "Product.Name", "Customer.ID", "Segment", "Category", "Sub.Category"), function(supstore_dfcopy)paste(supstore_dfcopy$Product.Name,supstore_dfcopy$Customer.ID, supstore_dfcopy$Segment, supstore_dfcopy$Category, supstore_dfcopy$Sub.Category                                               collapse = ","))

But the resultant data frame looks like this:

head(df_supstore_order_list, 5)
    Customer.Name                                                Product.Name Customer.ID
1     Claire Gute                           Bush Somerset Collection Bookcase    CG-12520
2     Claire Gute Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back    CG-12520
3 Darrin Van Huff   Self-Adhesive Address Labels for Typewriters by Universal    DV-13045
4  Sean O'Donnell               Bretford CR4500 Series Slim Rectangular Table    SO-20335
5  Sean O'Donnell                              Eldon Fold 'N Roll Cart System    SO-20335
    Segment        Category Sub.Category
1  Consumer       Furniture    Bookcases
2  Consumer       Furniture       Chairs
3 Corporate Office Supplies       Labels
4  Consumer       Furniture       Tables
5  Consumer Office Supplies      Storage
                                                                                                   V1
1                             Bush Somerset Collection Bookcase CG-12520 Consumer Furniture Bookcases
2      Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back CG-12520 Consumer Furniture Chairs
3 Self-Adhesive Address Labels for Typewriters by Universal DV-13045 Corporate Office Supplies Labels
4                    Bretford CR4500 Series Slim Rectangular Table SO-20335 Consumer Furniture Tables
5                            Eldon Fold 'N Roll Cart System SO-20335 Consumer Office Supplies Storage

As you can see, the customer name etc. is not being combined into a single column, the way I am looking for it to be. Any suggestions about how this could be done?

  • Can you provide a reproducible example of your dataset (see this link: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – dc37 Mar 19 '20 at 03:00
  • @dc37 Thanks. I am new to all of this. I have directly included the link to the data now. Would that help? Here is the link: https://www.kaggle.com/aksha17/superstore-sales – Kaustubh Mulay Mar 19 '20 at 03:06
  • You need to be register to download those data so it is not really helpful for other people. Please provide the output of `dput( head(supstore_df, 10))`, it will be much more helpful. – dc37 Mar 19 '20 at 03:08
  • I tried that and ended up exceeding the word count for the body. – Kaustubh Mulay Mar 19 '20 at 03:12
  • Remove first exmaple data you provided, it should reduce the size of the text in your question – dc37 Mar 19 '20 at 03:14
  • you need a combination of `group_by` and `toString`. For example, `supstore_df %>% group_by(Customer.Name) %>% summarise(Category = toString(Category))`. Add more groups in `group_by` if needed. It is not clear to me which columns you are trying to summarise. – Ronak Shah Mar 19 '20 at 03:15
  • @dc37 Will try and let you know. – Kaustubh Mulay Mar 19 '20 at 03:19

1 Answers1

0

Whilst it is not too clear what you want to combine, as I understand it you would like to combine

  1. Customer Identifiers into 1 column
  2. Product Identifiers into 1 column.

ie. for the same person making multiple orders, you would like to combine those orders onto one row.

Hence,

Name   ID   Item       Category    ItemsOrdered
John    1   book ----> John, 1     book, toy
John    1   toy

So assuming that assumption is correct (let me know if it isn't).

df <- data.frame(name = c('John', 'John', 'Jane', 'Jane'), id = c(1, 1, 2, 2), item = c('chair' , 'desk', 'hat' , 'shirt'))

df %>% 
  # Group by columns that identify Items you would like in the same row
  group_by(name, id) %>% 
  # paste together all items with ", "
  summarise(ItemsOrdered = paste(item, collapse = ', ')) %>% 
  # Unite the Columns you grouped by
  unite(col = Category, name, id, sep = ', ') 

# A tibble: 2 x 2
  Category ItemsOrdered
  <chr>    <chr>       
1 Jane, 2  hat, shirt  
2 John, 1  chair, desk 
Croote
  • 1,382
  • 1
  • 7
  • 15