0

I want to extract the only the object Scaffold and the number of scaffold (ex. Scaffold 6) of each string. Any ideas?

 [2] "KQ415657.1 isolate UCB-ISO-001 unplaced genomic scaffold Scaffold5, whole genome shotgun sequence"   
   [3] "ABCD0100000.1  isolate UCB-ISO-001 Scaffold6_contig_1, whole genome shotgun sequence"                
   [4] "ABCDD0100001.1  isolate UCB-ISO-001 Scaffold8_contig_1, whole genome shotgun sequence"                
   [5] "ABCD0100002.1  isolate UCB-ISO-001 Scaffold2_contig_1, whole genome shotgun sequence"               
   [6] "ABCD0100003.1  isolate UCB-ISO-001 Scaffold6_contig_1, whole genome shotgun sequence"               
   [7] "ABCD0100004.1  isolate UCB-ISO-001 Scaffold2_contig_1, whole genome shotgun sequence"               
   [8] "ABCD0100005.1  isolate UCB-ISO-001 Scaffold7_contig_1, whole genome shotgun sequence"               
   [9] "ABCD0100006.1  isolate UCB-ISO-001 Scaffold8_contig_1, whole genome shotgun sequence"               
Su. Doku
  • 23
  • 4

2 Answers2

1

is this stored as a string vector, or in a data.frame? Does each line always contain a Scaffold string?

If its just a vector:

STRING = c("This is some vector Scaffold1", "some Scaffold20 string with stuff")

stringr::str_split(string = STRING, pattern = " ") %>% 
    lapply(function(x) x[grepl("Scaffold", x)]) %>% 
    unlist()
[1] "Scaffold1"  "Scaffold20"

If you can have it in a data.frame it might be neater:

library(tidyverse)
data.frame(String = STRING, stringsAsFactors = F) %>% 
    separate(String, paste0("V", 1:8), remove = F) %>% 
    gather(key,val, starts_with("V")) %>% 
    filter(grepl("Scaffold", val)) %>% 
    select(-key)
                             String        val
1 some Scaffold20 string with stuff Scaffold20
2     This is some vector Scaffold1  Scaffold1
0

Taking a string from Athanasia Mowinckel's answer. Here is a sapply option.

STRING = c("This is some vector Scaffold1", "some Scaffold20 string with stuff")
sapply(str_extract_all(STRING,"Sca.*[0-9]"),"[")
NelsonGon
  • 13,015
  • 7
  • 27
  • 57