0

i have the following problem:

I have a vector called rain, (from a data frame df), that looks the following:

    [,1]
[1,]  0.0
[2,]  0.0
[3,]  0.4
[4,]  3.0
[5,]  0.1
[6,]  5.0
[7,]  19.0
[8,]  0.1
[9,]  7.2
[10,] 23.0

The vectors values range from 0 to 44.

The data frame has about 16000 rows.

first, I want to cut the vector into 4 intervals (0), (0 , 2.5) , [2.5, 10), [10, 50).

(0) means, that I want all values that are zero into one interval. (0) will mean no rain, (0 , 2.5) will mean medium rain and so on.

Second, I want to make the continuous variable rain into a categorial variable, so that the vector will look like this:

   [,1]
[1,]  no rain
[2,]  no rain
[3,]  light rain 
[4,]  medium rain
[5,]  medium rain
[6,]  medium rain
[7,]  stron rain
[8,]  light rain 
[9,]  medium rain 
[10,] strong rain 

I have tried the following:

df %>% mutate( rain_bins = cut( rain, breaks = c(-0.1,0,2.5,10,50) )

But I just don't know how I can overwrite df$rain so that I have the vector I want.

(I am planning to do this, because I want to do logistic regression)

Thank you in advance

1 Answers1

0

like this?

library(tidyverse)
data.frame(rain = c(0,0,0.4,3,0.1,5,19,0.1,7.2,23)) %>%
  mutate(rain.cat = cut(rain, 
                        breaks = c(-0.1,0,2.5,10,50), 
                        labels = c("no rain", "light rain", "medium rain", "strong rain")))

#    rain    rain.cat
# 1   0.0     no rain
# 2   0.0     no rain
# 3   0.4  light rain
# 4   3.0 medium rain
# 5   0.1  light rain
# 6   5.0 medium rain
# 7  19.0 strong rain
# 8   0.1  light rain
# 9   7.2 medium rain
#10  23.0 strong rain
Wimpel
  • 26,031
  • 1
  • 20
  • 37