I have an already ordered data frame that looks like the following:
mydf <- data.frame(ID="A1", Level=c("domain", "kingdom", "phylum", "class", "order", "family", "genus", "species"), Taxonomy=c("D__Eukaryota","K__Chloroplastida",NA,"C__Mamiellophyceae",NA,NA,"G__Crustomastix","S__Crustomastix sp. MBIC10709"), Letter=c("D","K","P","C","O","F","G","S"))
ID Level Taxonomy Letter
1 A1 domain D__Eukaryota D
2 A1 kingdom K__Chloroplastida K
3 A1 phylum <NA> P
4 A1 class C__Mamiellophyceae C
5 A1 order <NA> O
6 A1 family <NA> F
7 A1 genus G__Crustomastix G
8 A1 species S__Crustomastix sp. MBIC10709 S
What I would like is to replace the NA values with the last non-NA value, adding the ALL the Letters "missed" at the beginning in a rolling fashion... See what I mean below.
The goal is to obtain a data frame like this:
ID Level Taxonomy Letter
1 A1 domain D__Eukaryota D
2 A1 kingdom K__Chloroplastida K
3 A1 phylum P__K__Chloroplastida P
4 A1 class C__Mamiellophyceae C
5 A1 order O__C__Mamiellophyceae O
6 A1 family F__O__C__Mamiellophyceae F
7 A1 genus G__Crustomastix G
8 A1 species S__Crustomastix sp. MBIC10709 S
Notice the last 2 NAs, how the last one has to carry the value of the previous. See how the first one of the two starts with O__C and the last one with F__O__C.
So far, my best attempt is the following (thanks to Ajay Ohri):
library(zoo)
mydf <- data.frame(ID="A1", Level=c("domain", "kingdom", "phylum", "class", "order", "family", "genus", "species"), Taxonomy=c("D__Eukaryota","K__Chloroplastida",NA,"C__Mamiellophyceae",NA,NA,"G__Crustomastix","S__Crustomastix sp. MBIC10709"), Letter=c("D","K","P","C","O","F","G","S"))
mydf <- data.frame(lapply(mydf, as.character), stringsAsFactors=FALSE)
mydf$Letter2 <- ifelse(is.na(mydf$Taxonomy),paste(mydf$Letter,'__',sep=''),"")
mydf
mydf$Taxonomy <- paste(mydf$Letter2, na.locf(mydf$Taxonomy), sep='')
mydf
Notice how I still don't manage to do it in a rolling manner (I get F__C instead of F__O__C for the last NA). Any help? Thanks!
PS: let me know if it is still confusing, so I make another MWE with more NAs in a row, so it's more obvious what I need.