0

I have a data frame with 3 columns: category, brand, and total count of orders made per brand in a specific category:

  X2            X1        total.count
Accessories    3MT          2098
Accessories    A82          1834
Accessories    A14           109
Tops           A79            87
Tops           A74           435

I have 150 different categories and 1700 different brands.

I want to create a data frame with the category, the most ordered brand in that category, and the number of orders for that brand. So far, I used subsetting to retrieve the top brand in each category, but I could only figure how to do it one by one:

b1 <- subset (DF, X2 == "Accessories")
b1$total.count<- sort(b1$total.count, decreasing = TRUE)

From there I took the first row:

s1 = head(b1, n=1)

I had to do this for each category, and then bound them all together into one data frame:

topbrands = data.frame(rbind(head(b1,n = 
1),head(b2,n=1),head(b3,n=1),head(b4, n=1),head(b5,n=1)....

all the way until b150. Is there a sleeker or more efficient way to do this?

anrpet
  • 139
  • 1
  • 7

2 Answers2

2

Is this what you are looking for?


dplyr

library(dplyr)
df %>% group_by(X2) %>% filter(total.count==max(total.count)) %>% as.data.frame

Or with data.table:

dt = data.table(df)
dt[ , .SD[which.max(total.count)], by = X2]

output:

           X2  X1 total.count
1 Accessories 3MT        2098
2        Tops A74         435
Florian
  • 24,425
  • 4
  • 49
  • 80
1
do.call(rbind, lapply(split(df,df$X2), function(x) {return(x[which.max(x$total.count),])}))

Check if this helps!

Kalees Waran
  • 659
  • 6
  • 13