I have a data frame that looks like this:
df <- data.frame(V1 = c(">NP_abc d", "1", "Efg", "hij", "2", "3KL",
"VGSFNGWDGRRHPMRLRHPTGVWEIFVPRLQPGEVYKYEILGAHGILPLKSDPMALATTLPPDTASKISAPLKFEWHDQD",
">WP_mno p", "Rst", "uw", "adb", "cgi",
"PGLGIRLDKLASFVVTLSLYAGAYLTEVFRAGLLSIHKGQREAGLAIGLGEWQVRAYIIVPVMLRNVLPALSNNFISLFK",
">GP_hgs i Hhh yuy",
"WEGLETPVQVVWRHALLPVIELPLAALHDPEPLNLLDAPLLRLVHAEDPDNQRIVAVLLFHHLIMDHVALDLLSHELQAV"))
I want to remove the rows that occur between two rows. First row contains a string that always starts with a fixed character ">". The second row contains a string of variable characters that has a fixed length of 80.
Here is a secondary objective. If any of the rows tagged for removal contains strings that matches a string stored in another vector, then append it to the string starting with ">" in the flanking row. In addition, the strings should be appended in order of their occurence in the rows targeted for removal and be preeceded by a blank space.
Here is the vector for matching the strings:
some_vector <- c("Efg", "hij", "Rst", "adb")
This is what the output should look like:
df1 <- data.frame(V1 = c(">NP_abc d Efg hij",
"VGSFNGWDGRRHPMRLRHPTGVWEIFVPRLQPGEVYKYEILGAHGILPLKSDPMALATTLPPDTASKISAPLKFEWHDQD",
">WP_mno p Rst adb",
"PGLGIRLDKLASFVVTLSLYAGAYLTEVFRAGLLSIHKGQREAGLAIGLGEWQVRAYIIVPVMLRNVLPALSNNFISLFK",
">GP_hgs i Hhh yuy",
"WEGLETPVQVVWRHALLPVIELPLAALHDPEPLNLLDAPLLRLVHAEDPDNQRIVAVLLFHHLIMDHVALDLLSHELQAV"))
Please advice. This problem is way over my head and I haven't been able to tackle it beyond figuring out the logic directing the transformation of my data frame.
Best regards!