0

I have a problem with selecting a variable that should contain a certain range of values. I want to split up my variable into 3 categories. Namely; small, medium and big. A piece of context. I have a variable named obj_hid_woonopp which is (size in m2) and it goes from 16-375. And my dataset is called datalogitvar.

I'm sorry I have no reproduceable code. But since I think it's a rather simple question I hope it can be answered nonetheless. The code that I'm using is as follows

datalogitvar$size_small<-  as.numeric(obj_hid_WOONOPP>="15" & obj_hid_WOONOPP<="75" )
datalogitvar$size_medium<-  as.numeric(obj_hid_WOONOPP>="76" & obj_hid_WOONOPP<="100" )
datalogitvar$size_large<-  as.numeric(obj_hid_WOONOPP>="101")

When I run this, I do get a result. Just not the result I'm hoping for. For example the small category also contains very high numbers. It seems that (since i define "75") it also takes values of "175" since it contains "75". I've been thinking about it and I feel it reads my data as text and not as numbers. However I do say as.numeric so I'm a bit confused. Can someone explain to me how I make sure I create these 3 variables with the proper range? I feel I'm close but the result is useless so far.

Thank you so much for helping.

nghauran
  • 6,648
  • 2
  • 20
  • 29
Thundersheep
  • 45
  • 1
  • 8
  • 1
    Why are your numbers imported as strings in the first place? That should be the problem you should address first. It's hard to help you without a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to see what's going on. But you are only running `as.numeric()` on the Boolean comparison, not the character values. – MrFlick Oct 05 '17 at 15:49
  • Agreed with @MrFlick your issue is coming from inconsistently referencing numbers as character and numeric. Your issue is specific to them being stored as character so if you change everything to `as.numeric` in `obj_hid_WOONOPP` you should be good to go. – Mako212 Oct 05 '17 at 15:51
  • I just used > length(obj_hid_WOONOPP) [1] 90127 So it seems to be alright there. That does not seem to be causing the problem. – Thundersheep Oct 05 '17 at 15:52

3 Answers3

0

For a question like this you can replicate your problem with a publicly available dataset like mtcars.

And regarding your code 1) you will need to name the dataset for DATASET$obj_hid_WOONOPP on the right side of your code. 2) Why are you using quotes around your numeric values? These quotes prevent the numbers from being treated as numbers. They are instead treated as string values.

I think you want to use something like the code I've written below.

mtcars$mpg_small  <- as.numeric(mtcars$mpg >= 15 & mtcars$mpg <= 20)
mtcars$mpg_medium <- as.numeric(mtcars$mpg > 20 & mtcars$mpg <= 25)
mtcars$mpg_large  <- as.numeric(mtcars$mpg > 25)
Joe
  • 3,217
  • 3
  • 21
  • 37
  • Thanks, That was indeed my fault! I put quotations around my numbers. So stupid. I copied my lines from a previous piece where i quoted text. Thank you so much for the tip. Also I attached my data so I didn't need to use dataset$variable but thanks for that tip as well. It's better to write it the way you suggested anyway. – Thundersheep Oct 05 '17 at 16:03
0

Just to illustrate your problem:

a <- "75"
b <- "175"

a > b

TRUE (75 > 175)

a < b 
FALSE (75 < 175)

Strings don't compare as you'd expect them to.

Mako212
  • 6,787
  • 1
  • 18
  • 37
0

Two ideas come to mind, though an example of code would be helpful.

First, look into the documentation for cut(), which can be used to convert numeric vector into factors based on cut-points that you set.

Second, as @MrFlick points out, your code could be rewritten so that as.numeric() is run on a character vector containing strings that you want to convert to numeric values THEN perform Boolean comparisons such as > or &.

To build on @Joe

mtcars$mpg_small  <- (as.numeric(mtcars$mpg) >= 15 & 
                     (as.numeric(mtcars$mpg) <= 20))

Also be careful, if your vector of strings obj_hid_WOONOPP contains some values that cannot be coerced into numerics, they will become NA.

user8701090
  • 48
  • 1
  • 5