0

This question: splitting a continuous variable into groups of equal number of elements - return numeric vector from bin values explains how to split a variable into bins, but here I want to split a variable based on summarized frequencies in another variable. Is it possible in r?

I have a data.frame with two columns: one with prices and another with sum of sales in units.

PRICE   UNITS
300     46 
400     1817 
500     2409 
600     1667 
700     4282 
800     2301 
900     608 
1000    335 
1100    235 
1200    90 
1300    180 
1400    590 
1500    1013 
1600    80 
1700    646 
1800    1001 
1900    1982 
2000    965 
2100    1066 
2200    803 
2300    25 
2400    10 
2500    8 
2600    5 
2700    5 
2800    2 
2900    1 
3000    0 

Is it possible to get ranges with similar units in each range automatically?

Like:

1 [0-500] 4272
2 [600-700] 5949
3 [800-1400] 4339
4 [1500-1900] 4722
5 [>2000] 2890

I can group creating the ranges manually, it would be great if this could be automated. With the ranges, then grouping can be done like:

# Define the ranges manually
levels <- c(-Inf, 300, 500, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, Inf)
# Define labels for the groups
labels <- c("<300", "300-500", "500-800", "800-1000", "1000-1500", "1500-2000", "2000-2500", "2500-3000", "3000-3500", "3500-4000", "4000-4500", "4500-5000", "5000-6000", ">6000")
# Add a column with the GROUP
table$GROUP <- cut(table$PRICE, levels, labels, right=FALSE)

# Use dplyr to summarise the UNITS
x <- table %>% group_by(GROUP) %>% summarise (UNITS = sum(UNITS))
Spoolly
  • 1
  • 1
  • Welcome to Stack Overflow. If you have tried to write some codes, would you be able to share that? – jazzurro Jan 26 '18 at 01:12
  • I think `Hmisc::cut2` does this. Or you could combine `cut` and `quantile`. – Gregor Thomas Jan 26 '18 at 01:27
  • I can group creating the ranges manually, it would be great if this could be automated. With the ranges, then grouping can be done like: ` levels <- c(-Inf, 300, 500, 800, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 5000, 6000, Inf) labels <- c("<300", "300-500", "500-800", "800-1000", "1000-1500", "1500-2000", "2000-2500", "2500-3000", "3000-3500", "3500-4000", "4000-4500", "4500-5000", "5000-6000", ">6000") table$GROUP <- cut(table$PRICE, levels, labels, right=FALSE) x <- table %>% group_by(GROUP) %>% summarise (UNITS = sum(UNITS))` – Spoolly Jan 26 '18 at 01:27
  • I changed the group target as the main point of this one is the **automatic creation of roughly equal bin sizes**. If you try the solutions there and are unsatisfied, edit an example of that dissatisfaction into this question and we can reopen. – Gregor Thomas Jan 26 '18 at 01:32

0 Answers0