I have data on peoples employment status monthly for 1 year. The dataframe includes 4 variables: ID of a Person, Country, Month and Main Activity in that specific month (Employed, Unemployed, Inactive, Other). I put an example of it here:
ID <- c(1:10, 1:10)
country <- c("AT", "BE", "CH", "CZ", "HR", "SO", "SV", "RU", "GR", "GE", "AT", "BE", "CH", "CZ", "HR",
"SO", "SV", "RU", "GR", "GE")
month <- c("Jan", "Feb", "Mar", "Apr", "May", "Aug", "Dec", "Nov", "Sep", "Jan", "Jun", "Jul", "Oct",
"Jan", "Feb", "Mar", "Apr", "May", "Aug", "Dec")
act <- c("Unemployed", "Employed", "Other", "Other", "Inactive", "Unemployed", "Employed",
"Employed", "Employed", "Unemployed", "Other", "Unemployed", "Unemployed", "Unemployed",
"Other", "Other", "Employed", "Other", "Other", "NA")
df <- data.frame(ID, country, month, act)
df[order(ID),]
ID country month act
1 1 AT Jan Unemployed
11 1 AT Jun Other
21 1 AT Nov Unemployed
2 2 BE Feb Employed
12 2 BE Jul Unemployed
22 2 BE Sep Unemployed
3 3 CH Mar Other
13 3 CH Oct Unemployed
23 3 CH Jan NA
4 4 CZ Apr Other
14 4 CZ Jan Unemployed
24 4 CZ Jun Unemployed
5 5 HR May Inactive
15 5 HR Feb Other
25 5 HR Jul Other
6 6 SO Aug Unemployed
16 6 SO Mar Other
26 6 SO Oct Employed
7 7 SV Dec Employed
17 7 SV Apr Employed
27 7 SV Nov Employed
8 8 RU Nov Employed
18 8 RU May Other
28 8 RU Jan NA
9 9 GR Sep Employed
19 9 GR Aug Other
29 9 GR Jun Inactive
10 10 GE Jan Unemployed
20 10 GE Dec NA
30 10 GE Aug Unemployed
My goal is to create a new dataframe, where every row represents one spell of employment but with the condition that prior to AND after the spell of employment had to be a spell of unemployment. So that I am able to include only spells of employment where people switched from unemployment to employment and back to unemployment and calculate the duration of these spells. Ideally in the end there would be 4 variables: PersID, Country, duration of spell, starting month, end month. It should look like this:
ID country spell_duration starting ending
1 1 AT 5 Jan May
11 1 AT 5 Jun Oct
2 2 BE 7 Feb Aug
12 2 BE 6 Jul Dec
3 3 CH 10 Mar Dec
13 3 CH 1 Oct Oct
4 4 CZ 8 Apr Nov
14 4 CZ 5 Jan May
5 5 HR 5 May Sep
15 5 HR 4 Feb May
6 6 SO 2 Aug Sep
16 6 SO 6 Mar Aug
7 7 SV 1 Dec Dec
17 7 SV 9 Apr Dec
8 8 RU 8 Nov Dec
18 8 RU 7 May Nov
9 9 GR 3 Sep Nov
19 9 GR 2 Aug Sep
10 10 GE 8 Jan Aug
20 10 GE 1 Dec Dec
I already found this solution by Maria (How to calculate number and duration of categorical spells by ID in R) but her problem is different. I don't want the overall duration of employment and I also don't really need the number of spells