0

I have a data set with a continuous variable, and I want to split it up into factors.

This is my code so far:

data$pgr<-as.character(data$pgr)
data$pgr[data$pgr <= 7] <- "<= 7 progesterone receptors"
data$pgr[data$pgr > 7 & data$pgr <= 32.5] <- "7-32.5 progesterone receptors"
data$pgr[data$pgr > 32.5 & data$pgr <= 131.8] <- "32.5-131.8 progesterone receptors"
data$pgr[data$pgr > 131.8] <- "> 131.8 progesterone receptors"
data$pgr<-as.factor(data$pgr)

The thing is, it worked for a different variable when I only used one double inequality but won't work for this one? The <=7 and >131.8 both work.

I thought you have to use a double && but, for my other variable it only worked with a singular &. For this one, it works for neither.

Please could someone explain this to me/how to change my code? I would appreciate it.

learning
  • 11
  • 2
  • 1
    Does this answer your question? [How does R compare version strings with the inequality operators?](https://stackoverflow.com/questions/60476957/how-does-r-compare-version-strings-with-the-inequality-operators) – divibisan Jan 31 '22 at 17:28
  • 3
    The specific problem you're facing here is that once you convert the variable to character, inequality operators work on it like they would on strings, not as they'd do on numbers. That's what the duplicate above is about. The solution for your _actual_ problem should be here: [Create categorical variable in R based on range](https://stackoverflow.com/q/2647639/8366499) or here [Add column which contains binned values of a numeric column](https://stackoverflow.com/q/5570293/8366499) – divibisan Jan 31 '22 at 17:31
  • Why are you converting to character before doing the comparisons? – camille Jan 31 '22 at 19:29

1 Answers1

1

cut is your friend:

data <- data.frame(pgr = runif(100, 1, 200))
data$pgr <- cut(data$pgr,
                breaks = c(-Inf, 7, 32.5, 131.8, Inf),
                labels = c("<= 7 progesterone receptors",
                           "7-32.5 progesterone receptors",
                           "32.5-131.8 progesterone receptors",
                           "> 131.8 progesterone receptors"))
  • Thank you Javier! Are you able to explain the line "data <- data.frame(pgr = runif(100, 1, 200))" to me? – learning Feb 01 '22 at 14:03
  • `runif()` is a function to generate random uniform values. In this case, I am using it to generate 100 random values between 1 and 200. I am storing the result in a data.frame called `data` which contains just one column (pgr) with these values. It was just a way to have some actual data, similar to the ones you described. – Javier Herrero Feb 01 '22 at 15:59