I almost have what I need. I need some help with the last detail! The data set is produced by the following:
stu_vec <- c("A","B","C","D","E","F","G","H","I","J")
college_vec <- c("ATC","CCTC","DTC","FDTC","GTC","NETC", "USC", "Clemson", "Winthrop", "Allen")
sctcs <- c("ATC","CCTC","DTC","FDTC","GTC","NETC")
Student <- sample (stu_vec, size=100,replace=T, prob=c(.08,0.09,0.06,.07,.12,.10,.07,.05,.11,.05))
College <- sample(college_vec, size=100, replace=T,prob=c(.08,.07,.13,.12,.11,.06,.05,.08,.02,.08))
test.dat1 <- as.data.frame(cbind(Student, College))
I am using the following code to create what I need
library(dplyr)
set.seed(29)
test.dat2 <- test.dat1 %>%
group_by(Student, .drop=F) %>% #group by student
mutate(semester= sequence(n())) %>% #set semester sequence
summarise(home_school= College[min(which(College %in% sctcs))], # Find first college in sctcs
seq_home=min(which(College %in% sctcs)), # add column of sequence values
new_school= if_else(n_distinct(College) > 1,
first(College[!(College %in% sctcs) & semester > seq_home]), last(College))) #new_school should be the first non-sctcs school after the sctcs school is found or the last school for that student.
it produces the following table
I want the NA's to be filled in with the last college for that student. I don't know how to get rid of the NA's. If you know an easier way to produce the same thing please share the knowledge.