1

I'm hoping how I word this isn't too confusing. The dataset that I am using has recorded respondents' marital and years of school attended (degree). I would like to create a measure that corresponds to the highest degree of a caregiver living in the household.

My variables look like this:

0 = single, never married

1 or 2 = partner in the house, use higher degree between maternal highest degree or partner highest degree

3:7 = single-parent household, therefore use the maternal highest degree

# sample dataframe
household <- data.frame(
  ID = c(1, 2, 3, 4),
  marital = c(1, 4, 0, 2),
  education = c(14, 18, 10, 12),
  education_partner = c(18, NA, NA, 14)
)

I'm hoping to create a new column at the end of my data frame for the highest degree of a caregiver living in the home

Expected output:

ID    marital    education    education_partner    highest_degree     
1        1          14              18                  18     
2        4          18              NA                  18
3        0          10              NA                  10
4        2          12              14                  14

I tried to write this code to print the maternal education if it's a single parent household, but I don't know how to make it choose the higher of the two if its a 2 parent household (marital = 1 or 2). and I'm not even sure if an if, then statement would best help me. I'm new to learning R so any help is greatly appreciated -- thank you in advance!

if(household$marital =  0 | 3:7 )
  highest_degree<- (household$education)
ekoam
  • 8,744
  • 1
  • 9
  • 22
lziegs1
  • 43
  • 4

3 Answers3

2

You can use the pmax function from baseR to pull the max value across a defined set of columns in your dataframe. In our case this will be inspecting the education and education_partner fields.

new_data <- data %>%
  mutate(highest_degree = pmax(education, education_partner, na.rm = TRUE))

Output:

  ID marital education education_partner highest_degree
1  1       1        14                18             18
2  2       4        18                NA             18
3  3       0        10                NA             10
4  4       2        12                14             14
Ian Noriega
  • 121
  • 5
1

Here is a version using rowwise and c_across:

library(dplyr)
household %>% 
  rowwise() %>% 
  mutate(highest_degree = max(c_across(starts_with("education")), na.rm = TRUE)) 
     ID marital education education_partner highest_degree
  <dbl>   <dbl>     <dbl>             <dbl>          <dbl>
1     1       1        14                18             18
2     2       4        18                NA             18
3     3       0        10                NA             10
4     4       2        12                14             14
TarJae
  • 72,363
  • 6
  • 19
  • 66
0

If I understand, you want to check if marital status is 0 or 3 to 7. If so, use education as highest degree. If not, choose the maximum between education and education_partner?

You'll want to do this for each row, so you can use dplyr and rowwise. Then use mutate to create a new variable that performs that conditional statement.

library(dplyr)
  household %>% 
    rowwise() %>% # by row
    mutate(highest_degree = if(marital ==  0 | (marital >= 3 & marital <= 7 )){
  education } else {
    max(education_partner, education)
  }
) %>%
   ungroup()

gives

# A tibble: 4 x 5
     ID marital education education_partner highest_degree
  <dbl>   <dbl>     <dbl>             <dbl>          <dbl>
1     1       1        14                18             18
2     2       4        18                NA             18
3     3       0        10                NA             10
4     4       2        12                14             14
Brian Syzdek
  • 873
  • 6
  • 10
  • Thank you Brian for the detailed explanation -- I learned a lot just in reading and trying what you wrote! – lziegs1 Jan 14 '22 at 16:30