3

I have a data.table with sequences and number of reads, like so:

   sequence   num_reads
1: AACCTGCCG          1
2: CGCGCTCAA         12
3: AGTGTGAGC          3
4: TGGGTACAC         11
5: GGCCGCGTG         15
6: CCTTAAGAG          2
7: GCGGAACTG          9
8: GCGTTGTAG         17
9: GTTGTAGCG         20
10: ACACGTGAC        16

I'd like to use data.table to add two new columns to this table, based on the results of applying dpois() with two weights and two lambdas. The correct output should be this (based on using data.frame):

sequence     num_reads                       clus1                       clus2
1  AACCTGCCG         1 2.553269503552647000377e-03 1.610220613932057849571e-03
2  CGCGCTCAA        12 1.053993989051599418361e-02 2.887608256917401083896e-02
3  AGTGTGAGC         3 2.085170094567994833468e-02 1.717568654860860896672e-02
4  TGGGTACAC        11 1.806846838374168498498e-02 4.331412385376097462508e-02
5  GGCCGCGTG        15 1.324248858039188620275e-03 5.415587646672919558410e-03
6  CCTTAAGAG         2 8.936443262434262332916e-03 6.440882455728230530922e-03
7  GCGGAACTG         9 4.056186780023639942838e-02 7.444615037365168164207e-02
8  GCGTTGTAG        17 2.385595369261770803265e-04 1.274255916864215588610e-03
9  GTTGTAGCG        20 1.196285397159046524451e-05 9.538289904012846548518e-05
10 ACACGTGAC        16 5.793588753921446012421e-04 2.707793823336458478163e-03

But when I try to use data.table I can't seem to get the right result. Here is what I tried (based on similar questions asked around this topic):

    pois = function(n, p, l){return(dpois(as.numeric(as.character(n)), l)*p) }
    x = x[, c(paste("clus", seq(1,2), sep = '')) := pois(num_reads, c(0.4,0.6), c(7,8)), by = seq_len(nrow(x))]

And here is the result:

          sequence num_reads                             clus1                      clus2
    1:   AACCTGCCG         1       2.553269503552647000377e-03 2.553269503552647000377e-03
    2:   CGCGCTCAA        12       1.053993989051599418361e-02 1.053993989051599418361e-02
    3:   AGTGTGAGC         3       2.085170094567994833468e-02 2.085170094567994833468e-02
    4:   TGGGTACAC        11       1.806846838374168498498e-02 1.806846838374168498498e-02
    5:   GGCCGCGTG        15       1.324248858039188620275e-03 1.324248858039188620275e-03
    6:   CCTTAAGAG         2       8.936443262434262332916e-03 8.936443262434262332916e-03
    7:   GCGGAACTG         9       4.056186780023639942838e-02 4.056186780023639942838e-02
    8:   GCGTTGTAG        17       2.385595369261770803265e-04 2.385595369261770803265e-04
    9:   GCGTTGTAG        20       1.196285397159046524451e-05 1.196285397159046524451e-05
    10:  ACACGTGAC        16       5.793588753921446012421e-04 5.793588753921446012421e-04

The reason I'm using data.table and not data.frame is that my real data has 100,000s of rows. I studied the answers to this and this but I haven't been able to come up with a solution.

Any tips you have would be much appreciated. Thanks!

Community
  • 1
  • 1
sqlck
  • 87
  • 1
  • 7

1 Answers1

1

We can try

x[,  paste("clus", seq(1,2), sep = ''):= 
      as.list(pois(num_reads, c(0.4,0.6), c(7,8))), by = seq_len(nrow(x))]
x
#    sequence num_reads         clus1        clus2
#1: AACCTGCCG         1 0.00255326950 0.0016102206
#2: CGCGCTCAA        12 0.01053993989 0.0288760826
#3: AGTGTGAGC         3 0.02085170095 0.0171756865
#4: TGGGTACAC        11 0.01806846838 0.0433141239
#5: GGCCGCGTG        15 0.00132424886 0.0054155876
#6: CCTTAAGAG         2 0.00893644326 0.0064408825
#7: GCGGAACTG         9 0.04056186780 0.0744461504
#8: GCGTTGTAG        17 0.00023855954 0.0012742559
#9: GTTGTAGCG        20 0.00001196285 0.0000953829
#10:ACACGTGAC        16 0.00057935888 0.0027077938
akrun
  • 874,273
  • 37
  • 540
  • 662
  • this worked exactly as intended- thank you for your prompt response. I'm curious though- why is it necessary to coerce the function output to a list? – sqlck Feb 06 '17 at 02:26
  • @sqlck - `list` objects are converted to columns in `data.table` - that's just how it works. The same is true of `data.frame`s too - `data.frame(list(1,2,3))` for instance. – thelatemail Feb 06 '17 at 02:51