0

I'd like to make a new column in which the value depends on other columns. There are three possible outcomes

  1. Distance < Min_disp = 0
  2. Distance < Max_disp = Distance
  3. Distance > Max_disp = Max_disp

I have tried using an if-statement, with multiple outcomes, but receive a warning.

Warning messages: 1: In if (Noord_2015_moved$Distance < Noord_2015_moved$Min_disp) { : the condition has length > 1 and only the first element will be used 2: In if (Noord_2015_moved$Distance < Noord_2015_moved$Max_disp) { : the condition has length > 1 and only the first element will be used

And indeed it only prints "Max_disp".

This is the code I've used

if (Noord_2015_moved$Distance < Noord_2015_moved$Min_disp) {
  0
} else if (Noord_2015_moved$Distance < Noord_2015_moved$Max_disp) {
  Noord_2015_moved$Distance
} else {
  Noord_2015_moved$Max_disp
}

I have also tried running it in three separate steps, but then I run into the problem that I don't know how to tell R to only apply part of the df$column, because now I get the error

number of items to replace is not a multiple of replacement length

Noord_2015_moved <- mutate(Noord_2015_moved, Actual_disp = ifelse(Distance < Min_disp, 0, NA))
Noord_2015_moved$Actual_disp[Noord_2015_moved$Distance < Noord_2015_moved$Max_disp] <- Noord_2015_moved$Distance
Noord_2015_moved$Actual_disp[is.na(Noord_2015$Actual_disp)] <- Noord_2015_moved$Max_disp

And this is my data

'data.frame':   301 obs. of  15 variables:
 $ Transmitter: Factor w/ 18 levels "A69-1601-22313",..: 1 1 1 1 1 1 1 2 2 2 ...
 $ Date       : Date, format: "2015-03-03" "2015-03-08" "2015-03-11" "2015-05-18" ...
 $ Date_time  : Factor w/ 279544 levels "1-03-15 0:00",..: 198302 258702 18684 85140 190788 182641 208718 26315 198759 205744 ...
 $ Receiver   : Factor w/ 17 levels "uitzetpunt 1-noord",..: 8 5 8 5 6 7 6 8 5 8 ...
 $ Station    : Factor w/ 17 levels "10","11","12",..: 15 12 15 12 13 14 13 15 12 15 ...
 $ Traject    : Factor w/ 53 levels "","10-10","10-9",..: 53 50 41 50 40 44 45 53 50 41 ...
 $ Interval   : num  83.4 12.7 42.6 25.2 217.4 ...
 $ Distance   : num  1540 6480 6480 6480 4690 4220 4220 1540 6480 6480 ...
 $ Min_speed  : num  0.02 0.51 0.15 0.26 0.02 0.73 0.52 0.01 0.02 0.02 ...
 $ Min_speed2 : num  0.00556 0.14167 0.04167 0.07222 0.00556 ...
 $ Length     : int  47 47 47 47 47 47 47 45 45 45 ...
 $ Activity   : chr  "Low" "Low" "Low" "Low" ...
 $ Moved      : chr  "Yes" "Yes" "Yes" "Yes" ...
 $ Min_disp   : num  160 4080 1200 2080 160 5840 4160 80 160 160 ...
 $ Max_disp   : num  240 6120 1800 3120 240 8760 6240 120 240 240 ...
Andrea
  • 41
  • 8
  • Possible duplicate of [the condition has length > 1 and only the first element will be used in if else statement](https://stackoverflow.com/questions/34053043/the-condition-has-length-1-and-only-the-first-element-will-be-used-in-if-else) – jogo Oct 21 '19 at 13:09
  • @jogo, I did see that thread, but it did not resolve my misunderstanding. – Andrea Oct 21 '19 at 13:17

2 Answers2

2

if() isn't vectorized. It work on a single condition, not a whole vector. That's what the warning "the condition has length > 1 and only the first element will be used" is telling you. You could use if() for this purpose, but you would need to put it in a for loop to check each row one-at-a-time. Doable, but not efficient.

ifelseis a vectorized version of if, and is good for a problem like this. For something like this, you would probably nest 2 ifelses:

Noord_2015_moved$Actual_disp = ifelse(
  Noord_2015_moved$Distance < Noord_2015_moved$Min_disp, 0, 
  ifelse(Noord_2015_moved$Distance < Noord_2015_moved$Max_disp, Noord_2015_moved$Distance,
    Noord_2015_moved$Max_disp
  ))

I see you have a single mutate. If you're using dplyr, you can use mutate which adds a column to the data frame and means you don't need to type out the data frame's name to reference existing columns. This code is equivalent to my above code:

Noord_2015_moved = Noord_2015_moved %>% mutate(
  Acutal_disp = ifelse(Distance < Min_disp, 0, 
    ifelse(Distance < Max_disp, Distance, Max_disp)
  )
)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
1

In addition to using to ifelse multiple times, you can use dplyr::case_when, which handles multiple outcomes in the cleanest possible way:

Noord_2015_moved = Noord_2015_moved %>% mutate(
  Acutal_disp = case_when(
    Distance < Min_disp ~ 0,
    Distance < Max_disp ~ Distance,
    Distance > Max_disp ~ Max_disp,
    TRUE ~ NA_real_
  )
)

Here is a short reference.

slava-kohut
  • 4,203
  • 1
  • 7
  • 24