This question: splitting a continuous variable into groups of equal number of elements - return numeric vector from bin values explains how to split a variable into bins, but here I want to split a variable based on summarized frequencies in another variable. Is it possible in r?
I have a data.frame with two columns: one with prices and another with sum of sales in units.
PRICE UNITS
300 46
400 1817
500 2409
600 1667
700 4282
800 2301
900 608
1000 335
1100 235
1200 90
1300 180
1400 590
1500 1013
1600 80
1700 646
1800 1001
1900 1982
2000 965
2100 1066
2200 803
2300 25
2400 10
2500 8
2600 5
2700 5
2800 2
2900 1
3000 0
Is it possible to get ranges with similar units in each range automatically?
Like:
1 [0-500] 4272
2 [600-700] 5949
3 [800-1400] 4339
4 [1500-1900] 4722
5 [>2000] 2890
I can group creating the ranges manually, it would be great if this could be automated. With the ranges, then grouping can be done like:
# Define the ranges manually
levels <- c(-Inf, 300, 500, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, Inf)
# Define labels for the groups
labels <- c("<300", "300-500", "500-800", "800-1000", "1000-1500", "1500-2000", "2000-2500", "2500-3000", "3000-3500", "3500-4000", "4000-4500", "4500-5000", "5000-6000", ">6000")
# Add a column with the GROUP
table$GROUP <- cut(table$PRICE, levels, labels, right=FALSE)
# Use dplyr to summarise the UNITS
x <- table %>% group_by(GROUP) %>% summarise (UNITS = sum(UNITS))