0

I'm attempting to apply size classes to trees based on their diameter at breast height (dbh) measurement.

I have the following code to classify them, repeated 19 times.

    data_dbh$size_class_12 = ifelse(data_dbh$dbh1 %in% c(5.0:9.9), "A",
                            ifelse(data_dbh$dbh1 %in% c(10.0:19.9), "B",
                                   ifelse(data_dbh$dbh1 %in% c(20.0:39.9), "C",
                                          ifelse(data_dbh$dbh1 > 39.9, "D", NA))))

Some of the newly created columns in the dataset have NA, when the value in the respective dbh column is within the specified %in% range, such as 14.9, which ought to have returned B; this also occurs in the other %in% ranges.

I've tried str_trim with side = "both" to clear possible spaces, as well as as.numeric and as.character, but nothing changes.

I'm fairly new to R and bad in understanding programming language in general, so if any answers could be simplified as far as possible, that would be great.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
D. Ng
  • 1
  • 1
    If I understand you correctly, R would read c(1.5:3.8) as c(1:3) ? – D. Ng Jan 13 '21 at 02:46
  • Please ignore my previous comment ... a better way to say it (that is more accurate) is that the `:` operator will always assume `by=1`, so `a:b` is really `seq(a, b, by = 1)`. – r2evans Jan 13 '21 at 02:50

1 Answers1

1

You should not use %in% or == to do a floating point comparison. They are not accurate. Read Why are these numbers not equal?.

You can simplify your code here by using cut instead of nested ifelse statement.

data_dbh$size_class_12 <- cut(data_dbh$dbh1, 
                              breaks = c(5, 10, 20, 39.9, Inf), 
                              labels = c('A', 'B', 'C', 'D'))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for the help, it works like a charm! Just had to adjust the break values slightly so 10.0 turns up as B rather than A. In the case of the ifelse statements (which I'm more familiar with), what would be the appropriate "symbols" to use for floating decimals then? – D. Ng Jan 13 '21 at 02:37
  • You can't. The problem with decimals is you actually "see" something but they are stored in a different way. So you might see a value as 10.1 but actually it is 10.10000000009 so when you compare it with `%in%` 10.1 it will not match. – Ronak Shah Jan 13 '21 at 03:12