I have a dataframe like as shown below
test_df <- data.frame("SN" = c("ABC123","ABC123","ABC123","MNO098","MNO098","MNO098"),
"code" = c("ABC1111","DEF222","GHI133","","MNO1123","MNO567"),
"d_time" = c("2220-08-27","2220-05-27","2220-02-27","2220-11-27","2220-02-27",""))
I am trying to do 2 things
1) create 2 new columns (p_id
,v_id
) by stripping alphabets from columns SN
and code
and retain only 9 digits
2) create a lag column (p_vid
) based on v_id
for each person sorted based on his/her d_time
t_df <- test_df %>% group_by(SN)
t_df %>% arrange((d_time), .by_group = TRUE) ->> sorted_df #sorted based on d_time
transform_ids = function(DF){ # this function is to create person and visit_occurrence ids
DF %>%
mutate(p_id = as.integer(str_remove_all(.$SN,"[a-z]|[A-Z]") %>% #retaining only the numeric part
str_sub(1,9))) %>%
mutate(v_id = as.integer(str_remove_all(.$code,"[a-z]|[A-Z]") %>%
str_sub(1,9))) %>%
group_by(p_id) %>%
mutate(pre_vid = lag(v_id)) %>%
ungroup
}
transform_ids(sorted_df)
But when I do this I encounter the below error
Error in View : Column
p_id
must be length 3 (the group size) or one, not 6 Error: Columnp_id
must be length 3 (the group size) or one, not 6 In addition: Warning message: In view(transform_ids(t_df)) : Show Traceback Rerun with Debug Error: Columnp_id
must be length 3 (the group size) or one, not 6
I expect my output to be like as shown below. Basically I am trying to link each v_id
of a person to his previous visit which is p_vid