2

I have some patterns

A <- c("A..A","A.A","AA")
B <- c("B..B","B.B","BB")

and some sequences and their freqs in a data.frame

Seq     freq

CACCA     1

CAACC     2

BCCBC     3 

I need to match the pattern to the seqs, extract and assign the patterns as follow

Seq      freq   Pattern   From

CACCA     1   A..A      A

CAACC     2   AA        A

BCCBC     3   B..B      B

I used grep to match the pattern but it only returns the whole sequence, how can I extract the matched pattern and get the pattern group.

Thank you!

Sotos
  • 51,121
  • 6
  • 32
  • 66
Xiao-yan Pan
  • 45
  • 1
  • 1
  • 5

1 Answers1

1

You will need to put A and B in a data frame and stack it so it's in long format.

d1 <- stack(data.frame(A, B, stringsAsFactors = FALSE))
#  values ind
#1   A..A   A
#2    A.A   A
#3     AA   A
#4   B..B   B
#5    B.B   B
#6     BB   B    

#use gsub to convert the Seq to the same format as A and B
df$v1 <- gsub(' ', '.', trimws(gsub('[C-Z]', ' ', df$Seq)))
#which gives [1] "A..A" "AA"   "B..B"

df$From <- d1$ind[match(df$v1, d1$values)]

df
#    Seq freq   v1 From
#1 CACCA    1 A..A    A
#2 CAACC    2   AA    A
#3 BCCBC    3 B..B    B
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • thank you, Sotos. I have a list of pattern groups that have different number of patterns in them. stack returns error: arguments imply differing number of rows: 3, 1, 5. I can convert the pattern groups to data.frame, rename and use rbind, wonder is there a neat way to do this? – Xiao-yan Pan May 16 '17 at 12:23
  • Can you give an example of that list? – Sotos May 16 '17 at 12:31
  • @Xiao-yanPan the standard way to rbind vectors in a list of different length is to make their length the same first. [This should clarify](http://stackoverflow.com/questions/34570860/adding-na-to-make-all-list-elements-equal-length) – Sotos May 16 '17 at 12:47
  • 1
    Hi Sotos, I use data= list (patternA, patternB, patternC) then stack(unlist(data, recursive=FALSE)) and solve the problem, thank you again – Xiao-yan Pan May 16 '17 at 13:09