I have a dataset that currently lists student information on a term basis (i.e., 201610, 201620, 201630, 201640, 201710, etc.) with suffix 10 = fall, 20 = winter, 30 = spring, and 40 = summer. Not all terms are necessarily listed for every student.
What I would like to do is identify the first term in which a student was enrolled, presumably the fall, as T1, and subsequent terms as T2, T3, etc. Since some students may take a winter summer term, I would like to identify those as T1_Winter, T2_Summer, etc.
I've been able to isolate the individual terms for which a student has enrolled, and have been able to identify the first, intermediate, and last terms as 1, 2, 3, etc. However, I can't manage to wrap my head around how to identify fall and spring as 1, 2, 3, 4, and the intermediary terms, winter and summer, and 1.5, 2.5, 3.5, 4.5, etc.
# Create the sample dataset
data <- data.frame(
ID = c(1, 1, 1, 2, 2, 2, 2),
RegTerm = c(201810, 201820, 201830, 201910, 201930, 201940, 202010))
)
# Isolate student IDs and terms
stdTerm <- subset(data, select = c("ID","RegTerm"))
# Sort according to ID and RegTerm
stdTerm <- stdTerm[
with(stdTerm, order(ID, RegTerm)),
]
# Remove duplicate combinations of ID and term
y <- stdTerm[!duplicated(stdTerm[c(1,2)]),]
# Create an index to identify the term number
# for which a student enrolled
library(dplyr)
z <- y %>%
arrange(ID, RegTerm) %>%
group_by(ID) %>%
mutate(StdTermIndex = seq(n()))
Right now, it's identifying the progression of all terms for a student as 1, 2, 3, etc., but not winter and summer as intermediary terms. That is, if a student enrolled in fall and winter, winter will appear as 2 and spring will appear as 3.
In the sample data provided, I would like Student ID 1 to reflect 201810 as 1, 201820 as 1.5, and 201830 as 2, etc. Any suggestions or previous code I could reference to wrap my head around how I can code the intermediary semesters?