0

I have a translation table (trans_df):

trans_df <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
                       G           C          G         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C        del        CTT         T          C          C
                       A           C          G         A         C         T          T        CTT         T          C          C
                     del         del        del       del       del       del        del        del       del        del        del
                       G           C          G       del         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         G          C          C
                       G           C          G         A         C         C          T        del         T          C          C
                       A           C          G         A         C         C          T        CTT         T          C          C
                       G           C          A         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         T          C          T
                       G           C          G         A         C         C          T        CTT         T          T          C",header=TRUE, stringsAsFactors = FALSE, colClasses = "character")

and input :

    input <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
+ G|A           C        G|A         A         C       T|C          T  CTT         T        C|T          C", header = TRUE, stringsAsFactors = FALSE, colClasses = "character")

I want to find the input row in the trans_df using regular expression. I have achieved it by position:

Reduce(intersect,lapply(seq(1, ncol(trans_df)), 
                          function(i) {grep(pattern = input[, i], 
                          trans_df[, i])}))

Is there any way to do this where pattern = input? Please advise.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Dr. Richard Tennen
  • 267
  • 1
  • 2
  • 9

2 Answers2

1

I would just use subset() here and pass it the criteria for a matching row. In this case, the criteria involves checking each column in the data frame against a set of known values. Assuming that input is a named vector, we can try the following code:

subset(trans_df, rs1065852 == input["rs1065852"] & rs201377835 == input["rs201377835"] &
       ... & rs59421388 == input["rs59421388"])
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • If rs1065852 = A or G (="A|G") I need to check in the trans_df in this position if it's A or G. How this can be achieved using your solution? – Dr. Richard Tennen Dec 27 '17 at 10:27
  • 1
    @Dr.RichardTennen It starts getting ugly now: Use `(rs1065852 == 'A' | rs1065852 == 'G') & ` ... but this is already a departure from your question. Your question implies that you have a vector or maybe data frame of values, one for each column, and you want to extract rows from `trans_df` using that input. – Tim Biegeleisen Dec 27 '17 at 10:29
1

You can use Mapto achieve that, i.e.

Map(grep, input, trans_df)

However, that makes the assumption that your columns match one-on-one. If that does not stand, then you can use match to make them the same, i.e.

Map(grep, input[match(names(input), names(trans_df))], trans_df)
#or in the same sense and to keep input intact,
Map(grep, input, trans_df[match(names(trans_df), names(input))])

However, I think that would beat your purpose though.

Sotos
  • 51,121
  • 6
  • 32
  • 66