My sales dataset includes 3 columns: Countries, Sales Type/Method, Total Quarterly Revenue. Here's a display of the first few rows for a better idea:
Retailer.country Order.method.type Qtr.Rev
<fctr> <fctr> <dbl>
1 Australia E-mail 171407.28
2 Australia Sales visit 2013909.18
3 Australia Special 158795.34
4 Australia Telephone 2289201.87
5 Australia Web 1738303.59
6 Austria Sales visit 66926.18
7 Austria Telephone 1671887.40
8 Austria Web 7050164.50
9 Belgium Sales visit 1655507.05
10 Belgium Web 6222440.26
etc.........
Here's the dput of this data:
structure(list(Retailer.country = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L,
7L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L,
21L, 21L, 21L, 21L, 21L, 21L), .Label = c("Australia", "Austria",
"Belgium", "Brazil", "Canada", "China", "Denmark", "Finland",
"France", "Germany", "Italy", "Japan", "Korea", "Mexico",
"Netherlands",
"Singapore", "Spain", "Sweden", "Switzerland", "United Kingdom",
"United States"), class = "factor"), Order.method.type =
structure(c(1L,
4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 7L, 7L, 1L, 2L, 4L, 7L, 2L, 4L,
6L, 7L, 4L, 7L, 4L, 7L, 2L, 4L, 6L, 7L, 1L, 3L, 4L, 7L, 1L, 2L,
4L, 5L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 5L, 7L,
2L, 3L, 6L, 7L, 2L, 5L, 6L, 7L, 2L, 3L, 6L, 7L, 1L, 7L, 2L, 4L,
5L, 6L, 7L, 1L, 2L, 4L, 6L, 7L, 2L, 3L, 4L, 5L, 6L, 7L), .Label =
c("E-mail",
"Fax", "Mail", "Sales visit", "Special", "Telephone", "Web"), class =
"factor"),
Qtr.Rev = c(171407.28, 2013909.18, 158795.34, 2289201.87,
1738303.59, 66926.18, 1671887.4, 7050164.5, 1655507.05,
6222440.26,
7746789.52, 6864270.12, 195549.5, 450628.79, 12376528.53,
415128.31, 1453194.14, 2735416.3, 15777880.11, 413978.16,
3776833.13, 308638.6, 12328172.97, 709194.65, 1304167.86,
5897377.14, 11048160.97, 1546079.43, 1247170.05, 2373591.15,
12102240.99, 2461322.51, 165800.42, 1397604.56, 198705.05,
7413833.64, 2662351.94, 289704.5, 680467.87, 87186.72, 343708.86,
1802166.73, 16990817.52, 2821127.32, 431860.34, 10144353.75,
5063353.42, 1725508.54, 3571760.87, 593828.88, 1074860.66,
2981026.86, 5254137.56, 469627.61, 908725.05, 1625096.56,
9677070.09, 88788.41, 337710.73, 254360.21, 7835117.44,
1292812.39,
4818848.86, 217936.39, 792168.42, 790344.28, 109161.04,
4565896.64,
697619.35, 264500.2, 189218.02, 2022968.96, 13756025.4,
1357389.56,
2352483.29, 2842600.85, 685752.21, 13437403.28, 29573813.7
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-79L), .Names = c("Retailer.country", "Order.method.type", "Qtr.Rev"
))
I'm creating a contingency table in R that shows the quarterly revenue generated by each sales method for each country. The final output should look similar to this:
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web TOTAL.cn
1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7 Denmark 0.0 0.00 0.0 413978.16
...
20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22 TOTAL.type 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
The cast() function from the reshape library gets most of the job done, and only leaves the summary column and row for all values to be computed.
cast(sales.by.country, Retailer.country ~ Order.method.type,
fill=0) -> sales.by.country
Summing the rows into a new column named "TOTAL.cn" is pretty simple:
sales.by.country$TOTAL.cn <- rowSums(sales.by.country[,c(2:8)])
But summing the columns become a major headache, because the first component of that last row must either be a factor or character. I converted the first column "Retailer.country" to the character type, because it really serves as nothing more than a visual label.
After fussing around with several functions, this is the best code I was able to create to achieve the intended row summation:
# Sum the numeric columns, which is everything *except* column 1
total.by.ordertype <- (colSums(sales.by.country[,-1]))
# Create the Total by Order row
total.by.ordertype.row <- list("TOTAL.type", total.by.ordertype[1],
total.by.ordertype[2], total.by.ordertype[3], total.by.ordertype[4],
total.by.ordertype[5], total.by.ordertype[6], total.by.ordertype[7],
total.by.ordertype[8])
# Add the Total by Order row to the bottom of the table
sales.by.country[22, ] <- total.by.ordertype.row
It works and maintains the proper data types among all columns... BUT I figure there has to be a more efficient way, perhaps by using the apply family of functions, something from dplyr, etc. Maybe the only way is to write my own function?
For example, future datasets may have 50+ different sales methods. When creating the list for the "Total by Order" row (above), I had to call out every single cell within the vector, separated by commas, in order to successfully add this to my existing table. Other efforts converted the data types of all the other columns to characters, which messes everything up.
I don't mind copying/pasting "total.by.ordertype" 8 times so much. But what happens when I'm dealing with 50-100 order types? Is there a cleaner method to reproduce any of this?
Thank you!