0

I have a dataframe:

df <- structure(list(User.Employee.ID = c(40000583L, 40000494L, 40000273L, 
40000273L, 40001226L, 40000688L, 40000583L, 40000688L, 40001226L, 
40001226L, 40001226L, 3002211L, 40001115L, 40000494L, 40001226L, 
3002211L, 3002211L, 40000688L, 40000688L, 40001115L, 40001115L, 
40000494L, 40000494L, 40000583L, 40000273L, 40000273L), Event.Date = c("4/1/2020", 
"6/30/2020", "1/1/2021", "1/1/2021", "4/1/2021", "10/1/2021", 
"11/1/2021", "9/1/2020", "1/1/2021", "1/1/2021", "1/1/2021", 
"4/1/2021", "10/1/2020", "10/1/2021", "1/1/2020", "1/1/2020", 
"5/1/2020", "9/1/2020", "1/1/2020", "1/1/2020", "10/1/2020", 
"1/1/2020", "5/5/2020", "1/1/2020", "1/1/2020", "4/1/2020"), 
    End.Date = c("10/31/2021", "9/30/2021", "1/1/2021", "12/31/9999", 
    "12/31/9999", "12/31/9999", "12/31/9999", "9/30/2021", "1/1/2021", 
    "1/1/2021", "3/31/2021", "12/31/9999", "12/31/9999", "12/31/9999", 
    "12/31/2020", "4/30/2020", "5/31/2020", "9/1/2020", "8/31/2020", 
    "9/30/2020", "10/1/2020", "5/4/2020", "6/29/2020", "3/31/2020", 
    "3/31/2020", "12/31/2020"), Event = c("Promotion", "Return to Work", 
    "Data Change", "Data Change", "Promotion", "Promotion", "Rehire", 
    "Data Change", "Data Change", "Data Change", "Data Change", 
    "Promotion", "Data Change", "Promotion", "Data Change", "Data Change", 
    "Data Change", "Rehire", "Data Change", "Data Change", "Promotion", 
    "Data Change", "Suspension", "Data Change", "Data Change", 
    "Promotion"), Title = c("Project Leader", "Project Leader", 
    "Business Analyst GSI_niv. 2", "Business Analyst GSI_niv. 2", 
    "Senior Buyer-Niv.3", "Professional Account Manager (KAM)-Niv.2", 
    "Contract Manager", "Junior Account Manager (KAM)-Niv.1", 
    "Buyer-Niv.2", "Buyer-Niv.2", "Buyer-Niv.2", "Head of Group Electrical Equipment", 
    "Senior Expert Engineering_Niv. 3", "Senior Project Leader", 
    "Buyer-Niv.2", "Professional Account Manager (KAM)-Niv.2", 
    "Professional Account Manager (KAM)-Niv.2", "Junior Account Manager (KAM)-Niv.1", 
    "Manager Back Office", "Expert Engineering_Niv. 2", "Senior Expert Engineering_Niv. 3", 
    "Project Leader", "Project Leader", "Junior Project Leader", 
    "Data expert ALC", "Data expert ALC")), row.names = c(NA, 
-26L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
    User.Employee.ID = c(3002211L, 40000273L, 40000494L, 40000583L, 
    40000688L, 40001115L, 40001226L), .rows = structure(list(
        c(12L, 16L, 17L), c(3L, 4L, 25L, 26L), c(2L, 14L, 22L, 
        23L), c(1L, 7L, 24L), c(6L, 8L, 18L, 19L), c(13L, 20L, 
        21L), c(5L, 9L, 10L, 11L, 15L)), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

User.Employee.ID Event.Date End.Date   Event          Title           
              <int> <date>     <date>     <chr>          <chr>           
 1         40000494 2020-06-30 2020-09-30 Return to Work Project Leader  
 2         40001226 2020-04-01 1999-12-31 Promotion      Senior Buyer-Ni~
 3         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2     
 4         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2     
 5         40001226 2020-01-01 2020-03-31 Data Change    Buyer-Niv.2     
 6         40001115 2020-10-01 1999-12-31 Data Change    Senior Expert E~
 7         40000494 2020-10-01 1999-12-31 Promotion      Senior Project ~
 8         40001226 2020-01-01 2020-12-31 Data Change    Buyer-Niv.2     
 9         40001115 2020-01-01 2020-09-30 Data Change    Expert Engineer~
10         40001115 2020-10-01 2020-10-01 Promotion      Senior Expert E~
11         40000494 2020-01-01 2020-05-04 Data Change    Project Leader  
12         40000494 2020-05-05 2020-06-29 Suspension     Project Leader 
.....

In this dataframe, there are examples of three users who have been promoted in their organization. I would like to copy by user the old job value from Title column in the same row with Promotion value in Event column. On the example of the user 40001115: I need to take the Event.Date value of Promotion row which is 2020-10-01 and find for that user the previous month in End.Date which is 2020-09-30. Then copy the Title (Expert Engineering_Niv. 2) of that previous month to the Promotion row, so it would look like:

desiredresult <- data.frame(User.Employee.ID = c("40000494","40000494","40000494","40000494", "40001115", "40001115", "40001115", "40001226",  "40001226",  "40001226",  "40001226",  "40001226"),
                            Event.Date = c("2020-06-30","2020-10-01","2020-01-01","2020-05-05", "2020-10-01", "2020-01-01", "2020-10-01", "2020-04-01", "2020-01-01", "2020-01-01", "2020-01-01", "2020-01-01"),
                            End.Date = c("2020-09-30", "1999-12-31", "2020-05-04", "2020-06-29", "1999-12-31", "2020-09-30", "2020-10-01", "1999-12-31", "2020-01-01", "2020-01-01", "2020-03-31", "2020-12-31"),
                            Event = c("Return to Work", "Promotion", "Data Change", "Suspension", "Data Change", "Data Change", "Promotion", "Promotion", "Data Change", "Data Change", "Data Change", "Data Change"), 
                            Title = c("Project Leader", "Senior Project Leader", "Project Leader", "Project Leader", "Senior Expert Engineering_Niv. 3", "Expert Engineering_Niv. 2", "Senior Expert Engineering_Niv. 3", "Senior Buyer-Niv.3", "Buyer-Niv.2", "Buyer-Niv.2", "Buyer-Niv.2", "Buyer-Niv.2"),
                            Old_Title = c(NA, "Project Leader", NA, NA, NA, NA, "Expert Engineering_Niv. 2", "Buyer-Niv.2", NA, NA, NA, NA))


  User.Employee.ID Event.Date   End.Date          Event                            Title                 Old_Title
1          40000494 2020-06-30 2020-09-30 Return to Work                   Project Leader                      <NA>
2          40000494 2020-10-01 1999-12-31      Promotion            Senior Project Leader            Project Leader
3          40000494 2020-01-01 2020-05-04    Data Change                   Project Leader                      <NA>
4          40000494 2020-05-05 2020-06-29     Suspension                   Project Leader                      <NA>
5          40001115 2020-10-01 1999-12-31    Data Change Senior Expert Engineering_Niv. 3                      <NA>
6          40001115 2020-01-01 2020-09-30    Data Change        Expert Engineering_Niv. 2                      <NA>
7          40001115 2020-10-01 2020-10-01      Promotion Senior Expert Engineering_Niv. 3 Expert Engineering_Niv. 2
8          40001226 2020-04-01 1999-12-31      Promotion               Senior Buyer-Niv.3               Buyer-Niv.2
9          40001226 2020-01-01 2020-01-01    Data Change                      Buyer-Niv.2                      <NA>
10         40001226 2020-01-01 2020-01-01    Data Change                      Buyer-Niv.2                      <NA>
11         40001226 2020-01-01 2020-03-31    Data Change                      Buyer-Niv.2                      <NA>
12         40001226 2020-01-01 2020-12-31    Data Change                      Buyer-Niv.2                      <NA>

I have tried something using the lubridate package but it does not give me the desired output:

library(lubridate)
df %>% 
  group_by(User.Employee.ID) %>% 
  mutate(applied = +(row_number() %in% 
                              findInterval(lubridate::mdy(first(Event.Date)), 
                                           lubridate::mdy(End.Date))))

Any help or suggestions would be appreciated!

MariKo
  • 305
  • 4
  • 15

1 Answers1

0

This code would do the job:

df <- df %>% group_by(User.Employee.ID) %>% arrange(User.Employee.ID, Event.Date) %>%  
              mutate(Old_Title = ifelse(Event == "Promotion", lag(Title,1),NA))

> df
# A tibble: 12 x 6
# Groups:   User.Employee.ID [3]
   User.Employee.ID Event.Date End.Date   Event          Title                            Old_Title                       
              <int> <date>     <date>     <chr>          <chr>                            <chr>                           
 1         40000494 2020-01-01 2020-05-04 Data Change    Project Leader                   NA                              
 2         40000494 2020-05-05 2020-06-29 Suspension     Project Leader                   NA                              
 3         40000494 2020-06-30 2020-09-30 Return to Work Project Leader                   NA                              
 4         40000494 2020-10-01 1999-12-31 Promotion      Senior Project Leader            Project Leader                  
 5         40001115 2020-01-01 2020-09-30 Data Change    Expert Engineering_Niv. 2        NA                              
 6         40001115 2020-10-01 1999-12-31 Data Change    Senior Expert Engineering_Niv. 3 NA                              
 7         40001115 2020-10-01 2020-10-01 Promotion      Senior Expert Engineering_Niv. 3 Senior Expert Engineering_Niv. 3
 8         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2                      NA                              
 9         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2                      NA                              
10         40001226 2020-01-01 2020-03-31 Data Change    Buyer-Niv.2                      NA                              
11         40001226 2020-01-01 2020-12-31 Data Change    Buyer-Niv.2                      NA                              
12         40001226 2020-04-01 1999-12-31 Promotion      Senior Buyer-Niv.3               Buyer-Niv.2 

Note: there is a small difference between your desired output and mine, but looking at your desired data frame, the Event.Date column is not sorted by date at all. I feel you sorted the User column only. Here I did both, because I believe it will make more sense? Let me know otherwise.

Edit following comments:

> df <- df %>% group_by(User.Employee.ID) %>% arrange(User.Employee.ID, Event.Date) %>%  
+                 mutate(Old_Title = ifelse(Event == "Promotion", ifelse(Event.Date==lag(Event.Date,1),lag(Title,2),lag(Title,1)),NA))
> df
# A tibble: 12 x 6
# Groups:   User.Employee.ID [3]
   User.Employee.ID Event.Date End.Date   Event          Title                            Old_Title                
              <int> <date>     <date>     <chr>          <chr>                            <chr>                    
 1         40000494 2020-01-01 2020-05-04 Data Change    Project Leader                   NA                       
 2         40000494 2020-05-05 2020-06-29 Suspension     Project Leader                   NA                       
 3         40000494 2020-06-30 2020-09-30 Return to Work Project Leader                   NA                       
 4         40000494 2020-10-01 1999-12-31 Promotion      Senior Project Leader            Project Leader           
 5         40001115 2020-01-01 2020-09-30 Data Change    Expert Engineering_Niv. 2        NA                       
 6         40001115 2020-10-01 1999-12-31 Data Change    Senior Expert Engineering_Niv. 3 NA                       
 7         40001115 2020-10-01 2020-10-01 Promotion      Senior Expert Engineering_Niv. 3 Expert Engineering_Niv. 2
 8         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2                      NA                       
 9         40001226 2020-01-01 2020-01-01 Data Change    Buyer-Niv.2                      NA                       
10         40001226 2020-01-01 2020-03-31 Data Change    Buyer-Niv.2                      NA                       
11         40001226 2020-01-01 2020-12-31 Data Change    Buyer-Niv.2                      NA                       
12         40001226 2020-04-01 1999-12-31 Promotion      Senior Buyer-Niv.3               Buyer-Niv.2     

Edit following new comment: run first this to transform the Event.Date column from class character to class date:

df$Event.Date <- as.Date(df$Event.Date, format =  "%m/%d/%Y")
Mata
  • 538
  • 3
  • 17
  • thank you for your reply but your solution works partially (I have tried it myself before). See user 40001115. In his/her case, the Old_Title should be Expert Engineering_Niv. 2 but not Senior Expert Engineering_Niv. 3. You code arranges the Event.Date in order but the problem with my data set is that the upper row can contain the same date. – MariKo Nov 30 '21 at 15:26
  • What I need is to take the date of the Promotion from Event.Date column and link it to the previous month from the End.Date. So, in case of user 40001115, the code takes 2020-10-01 from Event.Date, finds the preceding month in End.Date column which is 2020-09-30 and copies the value Expert Engineering_Niv. 2 from this preceding month to the Promotion row. It is a bit complicated but hopefully my explanation helps. Otherwise, let me know. – MariKo Nov 30 '21 at 15:26
  • Ok, I think I got it. I edited my code following your comments, is this ok for you? – Mata Nov 30 '21 at 15:34
  • Thank you for your edit but the code works partially: for some users it does not output the desired outcome. I have edited the initial dataset in my question and added a user for who the code does not do the trick. user # 40000688. It outputs Manager Back Office when it should be Junior Account Manager (KAM)-Niv.1. It looks like the sorting on Event.Date for this specific user did not really work (I have several examples like that in my bigger dataset). Could you, please, have a look? – MariKo Dec 09 '21 at 10:16
  • Hi MariKo. To be fair, the code worked on your initial data, just not on the new one. Ideally, show your edit instead of replacing it all. Anyway, it was actually a date format issue. If you run first `df$Event.Date <- as.Date(df$Event.Date, format = "%m/%d/%Y")`, it will properly transform your column from class character to class Date, then it will sort them properly. You can also see it [there] (https://stackoverflow.com/questions/4310326/convert-date-time-string-to-class-date) for more details. I edited my code to add this line. If the answer works for you, you could accept it. – Mata Dec 13 '21 at 14:32