2

I was wondering if there is any way to wrap up the code below to make it shorter; I was thinking to use a loop or similar function to do so. This code generates a new variable (cat) using AgeatDeath and Disability. The code creates the cat variable with the value of 75.6-77.1 if AgeatDeath is between 75.6 and 77.1 and Disability equals "No Intelectual and Developmental Disabilities." Thanks, Nader

IDD <- IDD %>%
      mutate(
        cat = case_when(
          AgeatDeath >= 75.6 &
            AgeatDeath < 77.1  &
            Disability == 'No Intelectual and Developmental Disabilities' ~ '75.6-77.1',
          AgeatDeath >= 74.3 &
            AgeatDeath < 75.6  &
            Disability == 'No Intelectual and Developmental Disabilities' ~ '74.3-75.6',
          AgeatDeath >= 72.5 &
            AgeatDeath < 74.3  &
            Disability == 'No Intelectual and Developmental Disabilities' ~ '72.5-74.3',
          AgeatDeath >= 66.5 &
            AgeatDeath < 72.5  &
            Disability == 'No Intelectual and Developmental Disabilities' ~ '66.6-72.5',
          
          AgeatDeath >= 64.1 &
            AgeatDeath < 71.9  &
            Disability == 'Intellectual disability' ~ '64.1-71.9',
          AgeatDeath >= 62.3 &
            AgeatDeath < 64.1  &
            Disability == 'Intellectual disability' ~ '62.3-64.1',
          AgeatDeath >= 59.4 &
            AgeatDeath < 62.3  &
            Disability == 'Intellectual disability' ~ '59.4-62.3',
          AgeatDeath >= 50.4 &
            AgeatDeath < 59.4  &
            Disability == 'Intellectual disability' ~ '50.4-59.4',
          
          AgeatDeath >= 56.47 &
            AgeatDeath < 59.1  &
            Disability == 'Down syndrome' ~ '56.47-59',
          AgeatDeath >= 55.59 &
            AgeatDeath < 56.47  &
            Disability == 'Down syndrome' ~ '55.59-56.47',
          AgeatDeath >= 54.42 &
            AgeatDeath < 55.59  &
            Disability == 'Down syndrome' ~ '54.42-55.59',
          AgeatDeath >= 50.92 &
            AgeatDeath < 54.42  &
            Disability == 'Down syndrome' ~ '50.92-54.42',
          
          AgeatDeath >= 53.3 &
            AgeatDeath < 58.2  &
            Disability == 'Cerebral palsy' ~ '53.3-58.2',
          AgeatDeath >= 50.6 &
            AgeatDeath < 53.3  &
            Disability == 'Cerebral palsy' ~ '50.6-53.3',
          AgeatDeath >= 48.9 &
            AgeatDeath < 50.6  &
            Disability == 'Cerebral palsy' ~ '48.9-50.6',
          AgeatDeath >= 41.38 &
            AgeatDeath < 48.9  &
            Disability == 'Cerebral palsy' ~ '41.4-48.9',
          
          AgeatDeath >= 44.2 &
            AgeatDeath < 51.1  &
            Disability == 'Other rare developmental disabilities' ~ '44.2-51',
          AgeatDeath >= 41.6 &
            AgeatDeath < 44.2  &
            Disability == 'Other rare developmental disabilities' ~ '41.6-44.2',
          AgeatDeath >= 30.6 &
            AgeatDeath < 38.4  &
            Disability == 'Other rare developmental disabilities' ~ '30.6-38.4',
          AgeatDeath >= 38.4 &
            AgeatDeath < 41.6  &
            Disability == 'Other rare developmental disabilities' ~ '38.4-41.6'
        )
      )
Nader Mehri
  • 514
  • 1
  • 5
  • 21

2 Answers2

1

Some subsetting and the function cut() can go a long way. What I'll demonstrate doesn't involve dplyr.

First create an emtpy new variable. We'll use the rest of the code to fill it in a few lines.

IDD$cat <- NA_character

Next, create a list with the values of Disability and the corresponding cutpoints. We'll loop through this list.

L <- list(
`No Intelectual and Developmental Disabilities` = c(66.6, 72.5, 74.3, 75.6, 77.1),
`Intellectual disability` = c(50.4, 59.4, 62.3, 64.1, 71.9)
)

You can fill in the rest. Now, we'll use a loop to subset by each value of Disability, use cut() to split the values into categories and rename the categories.

for (d in names(L)) {
   IDD$cat[IDD$Disability == d] <- as.character(
                                      cut(IDD$Ageatdeath, 
                                        breaks = L[[d]], 
                                        labels = paste(L[[d]][-4], L[[d]][-1], sep = "-"),
                                        include.lowest = TRUE,
                                        right = FALSE))
}

cut() splits up the Ageatdeath based on the breakpoints we supplied to L. We give it labels based on the breakpoints. right = FALSE makes it so each category includes the lower bound and excludes the upper bound, and include.lowest = TRUE ensures that if any values are at the upper bound, they are included in the highest category. We use as.character() to make sure it's a character vector and not a factor.

Noah
  • 3,437
  • 1
  • 11
  • 27
  • Thanks. I tried the code below, but it did not capture some values of AgeatDeath including 72.01 for the no intellectual and developmental disabilities. What is wrong with the breaks: L <- list( `No Intelectual and Developmental Disabilities` = c(66.6, 72.5, 74.3, 75.6, 77.1) ) – Nader Mehri Nov 20 '20 at 02:58
1

Regardless of the approach you take, you are still going to need to store the thresholds and conditions somewhere. Right now these are written into your code, but they could be moved to a table.

Consider setting up a table

order | min_age | max_age | disability
------+--------+---------+------------
1     |75.6    | 77.1    | 'No Intelectual and Developmental Disabilities'
2     |74.3    | 75.6    | 'No Intelectual and Developmental Disabilities'
etc.
...

Then you can use the table to setup the conditions. Following the parse_exprs method from this question:

# loading of condition table
# other setup
# etc.

# ensure conditions are in the preferred order
twc = table_w_conditions %>%
  arrange(order)

# make text strings of conditions
conditions = paste("AgeatDeath >=", twc$min_age,
                  "& AgeatDeath <", twc$max_age,
                  "& Disability ==", twc&disability,
                  " ~ '", twc$min_age, "-", twc$max_age, "'")

# mutate treating text strings as code
IDD <- IDD %>%
  mutate(
        cat = case_when(!!!parse_exprs(conditions))
  )

If you take this approach, I recommend you review conditions before using it to check that it contains a list of text strings this the correct condition text.

Simon.S.A.
  • 6,240
  • 7
  • 22
  • 41