0

I found somewhat similar examples here and here, but I didn't follow the examples for the problem I am trying to solve.

What I would like to do is to use mutate and case_when to create a new column. The new column would create a category classification (e.g., "category_1") depending on the values from a different column. Since the number of values may change I want to make the case_when dynamic.

The problem is when this loop operates, it operates fine on each iteration, but when the loop advances it overwrites the previous values. So I am wondering how to use a case_when in a loop that would prevent the last loop value being evaluated while overwriting the previous iterations.

Here is a reproducible example:

library(tidyverse)

# Use built-in data frame for reproducible example
my_df <- mtcars

# Create sequence to reference beginning and end ranges within mpg values
mpg_vals <- sort(mtcars$mpg)

beg_seq <- seq(1, 31, 4)
end_seq <- seq(4, 32, 4)

# Create loop to fill in mpg category
for(i in 1:8){
  my_df <- my_df %>%
    mutate(mpg_class = case_when(
      mpg %in% mpg_vals[beg_seq[i]:end_seq[i]] ~ paste0("category", i)
    )
    )
  
  # Observe loop values
  print(mpg_vals[beg_seq[i]:end_seq[i]])
  print(paste0("category_", i))
}
Julien
  • 1,613
  • 1
  • 10
  • 26
DaveM
  • 664
  • 6
  • 19
  • In your `case_when`, you can specify what value should be set when the condition is not hit. As such, you can keep the previous value by setting as `case_when(..., TRUE ~ mpg_class)`. Outside of your loop, do initialize `mpg_class` first, e.g. `my_df$mpg_class = NA_character_` – jav Sep 04 '22 at 06:16
  • Also, note that `mpg_vals` has duplicates in it (e.g. 22.8) so another iteration can be overwritten still – jav Sep 04 '22 at 06:28
  • Ah, of course! Much appreciated. If you'd like to submit as an answer I will accept it. – DaveM Sep 04 '22 at 16:46
  • Do consider the answer given below by @Jon as it may give you what you need without a loop, unless you intend to do something more complicated – jav Sep 04 '22 at 16:56

1 Answers1

0

Edit:

If I understand the questions right, you want every fourth ranking of mpg to get a new category. You might use:

my_df %>%
    mutate(mpg_class = paste("category", 1 + min_rank(mpg) %/% 4))

That produces:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  mpg_class
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 category 5
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 category 5
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 category 7
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 category 6
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 category 4
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 category 4
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 category 2
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 category 7
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 category 7
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 category 5
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 category 4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 category 3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 category 4
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 category 2
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4 category 1
...

Original answer: A looped case_when seems complicated when you could do:

lengths <- end_seq - beg_seq + 1
my_df$mpg_class <- rep(paste0("category", 1:length(lengths)), lengths)

This finds the length of each category. Then we make a vector that repeats each category name as many times as the length of the category and assign that to an mpg_class column.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • Thanks Jon. The actual problem I was looking to solve was to denote which dates (in a column of dates) were in several ranges of dates (e.g., range 1, range 2,...). If the individual dates in the column of dates was in a given range, then create a new column to signify each of those date ranges (period 1, period 2,...). I will see if I can adopt your above for my actual problem. – DaveM Sep 05 '22 at 00:51