I have a sequence of numeric labels for records that can be shared by a variable number of records per label (labelsequence
). I also have the records themselves, but unfortunately for some of the sequence values, all records have been lost (dataframe df
). I need to identify when a numeric label from labelsequence
does not appear in the label column of df
, copy all records within df
that are associated with the closest label value that is less than the missing value, and append these to a newly filled-in dataframe, say df2
.
I am trying to accomplish this in R (a dplyr answer would be ideal), and have looked at answers to questions regarding filling in missing rows, such as Fill in missing rows in R and fill missing rows in a dataframe, and have a working solution below, was wondering if anyone has a better way of doing this.
Take , for instance, this example data:
labelsequence<-data.frame(label=c(1,2,3,4,5,6))
and
df<-data.frame(label=c(1,1,1,1,3,3,4,4,4),
place=c('vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','lamb',
'cougar','donkey','lamb','wolf'))
with desired result...
desired_df2<-data.frame(label=c(1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6),
place=c('vermont','kentucky',
'wisconsin','wyoming','vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky','utah',
'georgia','kentucky','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','wolf',
'wolf','cougar','cougar','lamb','cougar',
'donkey','lamb','wolf','donkey','lamb','wolf',
'donkey','lamb','wolf'))
Is there a better (be it effiency of code, flexibility, or resource efficiency) way than the following?
df2<- df %>%
full_join(expand.grid(label=unique(df$label),newlabel=labelsequence$label)) %>%
mutate(missing = ifelse(newlabel %in% label,0,1))%>%
filter(label<newlabel)%>%
group_by(newlabel) %>%
filter(label==max(label) & missing ==1) %>%
ungroup()%>%
mutate(label=newlabel,missing=NULL,newlabel=NULL) %>%
bind_rows(df) %>%
arrange(label)