1

I have a dataframe that looks like this:

chr    alleles    position
2      [A/T]      123456
3      [C/T]      5678910
8      [A/G]      8765435334

I'd like to load each row into variables such as:

library('BSgenome.Hsapiens.UCSC.hg19')
chr <- 'chr2'
alleles <- '[T/C]'
position <- 123456
offset <- 60

and then use them iteratively in:

seq <- paste(getSeq(Hsapiens,chr,position-offset,position-1),
+              alleles,
+              getSeq(Hsapiens,chr,position+1,position+offset),
+              sep='')

and finally have the output as another dataframe containing:

chr    allele    position     seq
2      [A/T]      123456      "ACTTGGAGATTTGGAGGAAGCTCCAGAGAGAGAGAGGCTTCCCAGCGTGGACTTGAAAGA[A/T]GAAACCAGCATAGATAGCACCGTGAATGGTGAGTTGGAATTCCTGGTTTCACTTTTGTTA"

I have read this thread, but appreciate a solution that doesnt require indices!

RJF
  • 427
  • 5
  • 16
  • Is `getSeq()` from a loaded package (in which case, which one?) or a function you've created (in which case could you add the source code for that function to your question)? – Phil Nov 14 '17 at 17:32
  • Also, is `Hsapiens` the name of your data frame? – Phil Nov 14 '17 at 17:33
  • @Phil Apologies for the confusion, yes getSeq is from the `BSgenome.Hsapiens.UCSC.hg19` library and `Hsapiens` is an attribute from the loaded packages. – RJF Nov 14 '17 at 17:49
  • Can you update your desired output to include some data in addition to the column names? – cparmstrong Nov 14 '17 at 18:05
  • @seeellayewhy, thanks for your comment. I edited the result section! – RJF Nov 15 '17 at 10:40

1 Answers1

0

I think you should be using a map()-type function from purrr.

I don't have access to your getSeq() function or your Hsapiens data, but something like this should work if I understand your problem correctly.

# define helper function to simply syntax and make code readable
seq_extractor <- function(data, chr, position, alleles, offset=60){
    pre_seq <- getSeq(data, chr, position-offset, position-1)
    post_seq <- getSeq(data, chr, position+1, position+offset)
    paste(pre_seq, alleles, post_seq, sep='')
}

# use pmap_chr() to map your function onto your existing data
df %>%
    mutate(seq = pmap_chr(list(chr, alleles, position), 
                          ~seq_extractor(Hsapiens, ..1, ..3, ..2))
Curt F.
  • 4,690
  • 2
  • 22
  • 39