1

I have a data frame for which i need to create a "recommendations" column based on the values of other columns. the code for which is like this :

  new_df<-
df1 %>% mutate(
  recommendation =
    case_when(
      optimal_stay_flag == "Too Short" &
        hour(Depart) < 18 &
        lead(df1$`Max Speed`) < 18 ~ "recommendation 1",
      optimal_stay_flag == "Too Short" &
        hour(Depart) >= 18 &
        lead(df1$`Max Speed`) < 18 ~ "recommendation 2!"
    )
) %>% relocate(recommendation, .after = `Max Speed`)

This works how I want it expect for one exception. I would need the output of the first recommendation (recommendation 1) to be lagged and the output of the second recommendation (recommendation 2) to be leading. basically I want the recommendation 1 to be one cell BELOW where it is currently outputting and recommendation 2 to be one cell ABOVE where it is currently outputting.

I have tried using lag() and lead() but don't know how to properly implement it into this code. I am also open to other approaches other than mutate/case_when.

find a sample of my data

df1 <- structure(list(Arrival = structure(c(1663833600, 1663912800, 
1664096400, 1664287200, 1664323200, 1664787600, 1665385200, 1665486000, 
1665651600, 1665817200, 1666094400, 1666252800, 1666335600, 1666422000, 
1666681200, 1667199600, 1667386800, 1667642400, 1667721600, 1667808000
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), Depart = structure(c(1663884000, 
1663952400, 1664132400, 1664323199, 1664395200, 1664816400, 1665417600, 
1665511200, 1665673200, 1665849600, 1666123200, 1666292400, 1666371600, 
1666458000, 1666706400, 1667232000, 1667415600, 1667678400, 1667764800, 
1667840400), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    optimal_stay_flag = c("Too Long", "Optimal", "Optimal", "Optimal", 
    "Too Long", "Optimal", "Optimal", "Too Short", "Too Short", 
    "Optimal", "Optimal", "Optimal", "Optimal", "Optimal", "Too Short", 
    "Optimal", "Optimal", "Optimal", "Optimal", "Optimal"), duration_at_port = c(14, 
    11, 10, 9.99972222222222, 20, 8, 9, 7, 6, 9, 8, 11, 10, 10, 
    7, 9, 8, 10, 12, 9), `Port Booking Status` = c("Confirmed", 
    "Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confirmed", 
    "Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confirmed", 
    "Confirmed", "Confirmed", "Confirmed", "Confirmed", "Confirmed", 
    "Confirmed", "Confirmed", "Confirmed", "Confirmed"), Distance = c(212, 
    81, 668, 743, NA, 1958, 571, 260, 642, 669, 1252, 583, NA, 
    120, 1097, 552, 732, 1099, NA, NA), `Max Speed` = c(17.67, 
    16.2, 18.56, 17.69, 0, 17.88, 16.08, 20.87, 19.68, 18.54, 
    19.41, 17.05, 7.4, 10, 17.98, 15.77, 18.3, 18.26, 7.4, 7.4
    )), row.names = c(NA, 20L), class = "data.frame")
marcelklib
  • 91
  • 5
  • 1
    please provide your dataset – yuliaUU Sep 22 '22 at 16:50
  • 1
    marcelklib, (almost) never use `df1$` in a pipe stemming from `df1`; at best it is a little inefficient, but if there is anything in between `df1 %>%` and the `df1$`-reference that reorders, modifies, or deletes content from the original `df1`, the operation will either loudly fail or silently corrupt your data. – r2evans Sep 22 '22 at 17:05
  • BTW, you are referencing `head(\`Max Speed\`)` which is almost certainly wrong, should that be `lead(\`Max Speed\`)`? – r2evans Sep 22 '22 at 17:05
  • After those fixes, it's difficult to help without a more reproducible question, as yuliaUU already said. Please see https://stackoverflow.com/q/5963269 for suggestions on how to include it, including `dput(head(x))` and `read.table(...)`; the former is likely the simplest for you, just copy its output and paste into a [`\`\`\`\ncode block\n\`\`\``](https://stackoverflow.com/editing-help). Thanks! – r2evans Sep 22 '22 at 17:08
  • thank you all for the comments, I was trying to figure out how to paste my data in a readable way, best way i could was using dput() hope this helps! – marcelklib Sep 22 '22 at 17:21

1 Answers1

1

Two things, brought forward from the comments:

  • (almost) never use df1$ dollar-referencing inside a pipeline stemming from the df1 dataset. At best, it is inefficient, but if any operations (dplyr verbs) between the df1 %>% and the line referencing df1$ results in reordering, updated data, additional content, or filtered content, then the results will either loudly fail (better) or silently corrupt your data (worse). There are exceptions when this is meaningful, but they are few and certainly not here.
  • You have head(`Max Speed`) in your code, which should likely be lead. It's not clear if this is a typo in your question or your real code, I'll fix it here.

Ultimately, the trick is to use lead or lag on the entire conditional, not just on one component.

Since we don't have your data, I'll demonstrate the lead/lag component on mtcars. My logic:

  • "recommendation 1" if the previous row's disp < 77 & gear < 5;
  • "recommendation 2!" if the next row's disp > 400 & gear > 5;
  • NA otherwise
library(dplyr)
mtcars %>%
  mutate(
    recommendation = case_when(
      lead(disp < 77 & cyl < 5) ~ "recommendation 1",
      lag(disp > 400 & cyl > 5) ~ "recommendation 2!"
    )
  )
#                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb    recommendation
# Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4              <NA>
# Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4              <NA>
# Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1              <NA>
# Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1              <NA>
# Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2              <NA>
# Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1              <NA>
# Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4              <NA>
# Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2              <NA>
# Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2              <NA>
# Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4              <NA>
# Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4              <NA>
# Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3              <NA>
# Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3              <NA>
# Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3              <NA>
# Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4              <NA>
# Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 recommendation 2!
# Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4 recommendation 2!
# Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  recommendation 1
# Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  recommendation 1
# Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1              <NA>
# Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1              <NA>
# Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2              <NA>
# AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2              <NA>
# Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4              <NA>
# Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2              <NA>
# Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1              <NA>
# Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2              <NA>
# Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2              <NA>
# Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4              <NA>
# Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6              <NA>
# Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8              <NA>
# Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2              <NA>
r2evans
  • 141,215
  • 6
  • 77
  • 149