0

I have a question related to ranking rows for each subgroup. Suppose I have a data about the employee information of a company. In this company, there are three departments (Alpha, Beta and Sigma) and in each department there are two teams (team 1 and team 2). A person can be either a member or a leader in the team.

The issue is a person can belong to multiple teams with different roles across departments. What I want to do is for people with multiple positions, I want to rank what is his/her main job and what are his/her secondary jobs. The ranking logic is that the position is the most important criteria: Leader>Member. If positions are equal, the department is the next most important criteria: Alpha>Beta>Sigma. If departments are also equal, then I should look at teams: team 1>team>2. Below is a sample of what the result should look like:

Person_Name  Department_Name  Team_Name     Team_Position   Ranking
       Tony        Alpha              1           Member    1
       Tony        Beta               1           Member    2

       Bill        Alpha              1           Leader    1
       Bill        Alpha              2           Leader    2
       Bill        Beta               1           Leader    3

       Luke        Alpha              1           Member    2
       Luke        Beta               1           Leader    1
       Luke        Sigma              1           Member    3

I think I should be able to do this with dplyr package. However, I am struggling on the details of the codes and not sure how to avoid for/if loops because the real data have much more departments and teams. Thanks!

RandomThinker
  • 391
  • 1
  • 6

1 Answers1

1

In this case, you can arrange the data and assign row number for each Person_Name.

library(dplyr)
df %>%
  arrange(Person_Name, Team_Position, Department_Name, Team_Name) %>%
  group_by(Person_Name) %>%
  mutate(Ranking = row_number())


#  Person_Name Department_Name Team_Name Team_Position Ranking
#  <chr>       <chr>               <int> <chr>           <int>
#1 Bill        Alpha                   1 Leader              1
#2 Bill        Alpha                   2 Leader              2
#3 Bill        Beta                    1 Leader              3
#4 Luke        Beta                    1 Leader              1
#5 Luke        Alpha                   1 Member              2
#6 Luke        Sigma                   1 Member              3
#7 Tony        Alpha                   1 Member              1
#8 Tony        Beta                    1 Member              2

The above works because co-incidentally "Leader" comes earlier than "Member" and similarly Department_Name and Team_Name follow the same order as we want.


A more general solution would be to change this columns to factors in the level that we want and then arrange and create Ranking column for Person_Name.

df %>%
  mutate(Team_Position = factor(Team_Position, levels = c('Leader', 'Member')), 
         Department_Name = factor(Department_Name, 
                              levels = c('Alpha', 'Beta', 'Sigma')), 
         Team_Name = factor(Team_Name, levels = c(1, 2))) %>%
   arrange(Person_Name, Team_Position, Department_Name, Team_Name) %>%
   group_by(Person_Name) %>%
   mutate(Ranking  = row_number())
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213