Iterate over two dataframes to create a criterion column in R

Question

I am looking to form a final data frame by iterating over two data frames.

The first df looks like this


employee_name <- c("rob", "peter")
employee_attribute1 <- c("10", "5")
employee_attribute2 <- c("5", "5")
employee_df <- data.frame(employee_name, employee_attribute1,employee_attribute2)

the second one looks like this. Employees can and do belong to multiple groups

employee_group <- c("1", "2","3" )
employee_attributes <- c("employee_attribute2", "employee_attribute2","employee_attribute1" )
group_att_mapping_df <- data.frame(employee_group,employee_attributes)

I would like to create a data frame like this. The rating condition is if employee attribute score in employee_df for the employee_attributes ( from group_att_mapping_df) = 10 then the rating is 1 else it is 2.

 employee_group| employee_name | employee_rating
     1         | Rob            |  2 
     1         | Peter          |  2
     2         | Rob            |  2
     2         |Peter           |  2
     3         | Rob            |  1
     3         | Peter         |  2

Need help writing valid R code to iterate over the two data frames. I will attach a screenshot of the loop I have written so far. Many thanks in advance.

(1) Your sample data is barely usable, the `...` doesn't parse, so we need to edit your code just to make it work *a little*. After that, we only get a few rows, not the six rows you have in your expected output. Please give whole-enough data that actually parses correctly; even better, just [edit] your question and add the output from `dput(x)` (where `x` is a sample of your data, perhaps `head(employee_df)`). (2) This seems like a `merge` followed by column-wise calcs, see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272 for good merge/join discussion. — r2evans, Feb 22 '22 at 07:13
The rows are enough to create the expected output. But please remove the `...` as suggested by @r2evans — jpiversen, Feb 22 '22 at 07:18

jpiversen · Accepted Answer · 2022-02-22T07:28:08.147

You don't need to iterate over the dataframes. You can use pivoting and joining to get the expected output:

employee_df %>% 
  tidyr::pivot_longer(
    contains("attribute"), 
    names_to = "employee_attributes", 
    values_to = "attribute_score"
  ) %>% 
  left_join(group_att_mapping_df) %>% 
  mutate(employee_rating = ifelse(attribute_score == 10, 1, 2)) %>% 
  select(employee_group, employee_name, employee_rating) %>% 
  arrange(employee_group)

#> Joining, by = "employee_attributes"
#> # A tibble: 6 x 3
#>   employee_group employee_name employee_rating
#>   <chr>          <chr>                   <dbl>
#> 1 1              rob                         2
#> 2 1              peter                       2
#> 3 2              rob                         2
#> 4 2              peter                       2
#> 5 3              rob                         1
#> 6 3              peter                       2

Explaination:

First we pivot employee_df to get all the attributes variables as one column.
Next we left join with group_att_mapping_df to get the employee_group column.
Then we create the employee_rating column based on your logic: 1 if the attribute score is 10, and 2 otherwise.
Finally I just make the data look the same as your expected output, by selecting the columns you want and sorting them by employee_group.

Data without the `...`

employee_name <- c("rob", "peter")
employee_attribute1 <- c("10", "5")
employee_attribute2 <- c("5", "5")
employee_df <- data.frame(employee_name, employee_attribute1,employee_attribute2)

employee_group <- c("1", "2","3")
employee_attributes <- c("employee_attribute2", "employee_attribute2","employee_attribute1")
group_att_mapping_df <- data.frame(employee_group,employee_attributes)

Thank you. This works great. I had to just use gather() instead of pivot_longer() as I have an older version of tidyr. — HashingData, Feb 23 '22 at 01:29

Iterate over two dataframes to create a criterion column in R

1 Answers1

Data without the ...

Data without the `...`