0

I have a DF with a 'size' column with numbers. I would like to replace these numbers with small, medium or large based on the ranges small = 1:10 medium = 11:49 large = 50:200.

I have tried using

table$Size <- factor(table$Size,
                    levels = c(1:10),c(11:49),c(50:200),
                    labels = c("small"),c("medium"),c("large"))

But I understand why this doesnt work. I have also tried using str_replace all but this also produce an error.

Is there a way to replace numbers within these ranges with the respective label?

jjgg112244
  • 17
  • 5
  • 2
    You should use `cut`/`findInterval` instead https://stackoverflow.com/questions/12979456/r-code-to-categorize-age-into-group-bins-breaks and https://stackoverflow.com/questions/5746544/cut-by-defined-interval – Ronak Shah Sep 02 '20 at 10:28

2 Answers2

3

The cut() function converts numerical variables to factors. You can supply breaks to tell where the cuts should happen. This replaces your attempt at levels. Then you apply your labels. You also need to specify a right argument - should the interval be closed on the right (or left if false).

set.seed(10)
x <- sample(1:200, 1000, replace = TRUE)
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00   50.75  101.00  101.57  153.00  200.00 
x <- cut(x, breaks = c(0, 10, 49, 200),
         labels = c("small", "medium", "large"),
         right = TRUE)
summary(x)
 small medium  large 
    51    189    760 

I also want to point out an issue with your code. In your line labels = c("small"),c("medium"),c("large"), you have the commas outside the c(). You should enclose all desired elements of your vector in the same c():

 labels = c("small", "medium", "large")

If the commas are outside of the parentheses, R will map only c("small") to labels and then try to match c("medium") to the next argument of the function.

Ben Norris
  • 5,639
  • 2
  • 6
  • 15
0

Recode the levels with a list.

table1$Size.fac <- factor(table1$Size)

levels(table1$Size.fac) <- list("small" = 1:10,
                            "medium" = 11:49,
                            "large" = 50:200)

table1
#   Size Size.fac
# 1  156    large
# 2   17   medium
# 3  128    large
# 4    7    small
# 5   77    large
# 6  112    large

Data:

table1 <- structure(list(Size = c(156L, 17L, 128L, 7L, 77L, 112L)), row.names = c(NA, 
6L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110