0

I have a dataframe that looks like below.

ID number is the person's ID, position is their job position, Dt_Alter is the date when they changed role and Education is their background.

I need to calculate their average time in position (until they get to the manager position) and also how many times they changed role until they get to manager.

Any tips I appreciate as I am new to R and having a hard time at this part of the analysis. The dataframe is huge.

ID Number        Position       Dt_Alter    Education
        2         MANAGER     2019-02-01    BUSINESS, MANAGEMENT AND ADMINISTRATION
        2     COORDINATOR     2019-01-01    BUSINESS, MANAGEMENT AND ADMINISTRATION
  2000261         MANAGER     2018-12-01    BUSINESS, MANAGEMENT AND ADMINISTRATION
  2000261      SUPERVISOR     2016-12-01    BUSINESS, MANAGEMENT AND ADMINISTRATION
  2000553         MANAGER     2018-12-01    ENGINEERING
  2000553     COORDINATOR     2016-04-01    ENGINEERING




structure(list(Matricula = c(2L, 2L, 2L, 2L, 2L), 
Desc2 = c("GERENTE", "COORDENADOR SEGUROS", "COORDENADOR SEGUROS", "COORDENADOR SEGUROS", "COORDENADOR SEGUROS"), 
Dt_Alteracao = c("01/02/2019", "01/01/2019", "01/01/2018", "01/09/2017", "01/09/2016"), 
Education = c("BUSINESS, MANAGEMENT AND ADMINISTRATION", "BUSINESS, MANAGEMENT AND ADMINISTRATION", "BUSINESS, MANAGEMENT AND ADMINISTRATION",  "BUSINESS, MANAGEMENT AND ADMINISTRATION", "BUSINESS, MANAGEMENT AND ADMINISTRATION")), 
row.names = c("2.10823", "2.10824", "2.10825", "2.10826", "2.10827"), class = "data.frame")
OTStats
  • 1,820
  • 1
  • 13
  • 22
Maria
  • 11
  • 4
  • 1
    Would you please post a small sample of the data in a reproducible manner using dput()? – Bill O'Brien Sep 09 '19 at 16:27
  • sure, but its too long to put it in here. How do I do in this case? – Maria Sep 09 '19 at 17:47
  • 1
    The dput function is useful. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Bill O'Brien Sep 09 '19 at 17:49
  • 1
    Try something like `dput(head(df))`. If `head(df)` isn't sufficiently illustrative of the question, first extract another small dataframe which is adequately informative. Note that any such `dput()` would belong in the question rather than the comments. – John Coleman Sep 09 '19 at 17:53
  • 1
    woops, I have to put it in the question, hold on – Maria Sep 09 '19 at 17:57

1 Answers1

1

Below is my rough approach to this problem, using a combination of ifelse and lag functions. Basically, after ensuring you've ordered the file on ID.Number and date, the order of records should permit comparisons across records. I made a flag for whether someone changed their position, and if true, computed the difftime across those records.

Hope this helps.

df$Matricula<-as.character(df$Matricula)
df$Dt_Alteracao<-strptime(df$Dt_Alteracao,format="%d/%m/%Y")
df<-df[order(df$Matricula, df$Dt_Alteracao), ]

# indicator for whether a position change occurred
df$changePos<-ifelse( df$Matricula== lag(df$Matricula,1) & df$Desc2 != lag(df$Desc2,1), 
                  "Changed Position", "Same")   # review this logic for a variety of row groupings

# measure weeks between positions
library(lubridate)
df$Dt_Alteracao2<-as.POSIXct(df$Dt_Alteracao)
df$time_in_pos<-ifelse(df$changePos=="Changed Position", 
                   difftime(lag(df$Dt_Alteracao2,1),df$Dt_Alteracao2,units ='weeks'),NA )
Ben
  • 1,113
  • 10
  • 26
  • thanks Ben for your help ;) I've got an error message in the last code: In unclass(time1) - unclass(time2) : longer object length is not a multiple of shorter object length – Maria Sep 09 '19 at 20:14
  • Looks like you need to convert the date variable to `POSIXct` format. I updated the code above. Also, I assume your date format is day-month-YEAR. Make sure that's specified correctly in your code. – Ben Sep 10 '19 at 16:23