type conversion - type coercion only in loop?

Question

starting from following working construct:

sum(sapply(DNAStringSet(seq_set[, 1]), function(s)
  countPWM(motifs[[1]], reverseComplement(s), min.score = "75%")))

I write this loop:

percentages <- as.character(seq(0, 100, 5))

for (i in 1:length(percentages)) {
  sum(sapply(DNAStringSet(seq_set[, 1]), function(s)
    countPWM(
      motifs[[1]],
      reverseComplement(s),
      min.score = as.character(cat('"', percentages[i], "%" ,  '"', sep = "")
    ))))
}

and the following is returned:

 Error in .normargMinScore(min.score, pwm) : 
  'min.score' must be a single number or string

I do realize, there is a problem with the data type of

min.score

but when I check:

test <- as.character(cat('"', percentages[1], "%" ,  '"', sep = ""))
typeof(test)


> typeof(test)
[1] "character"

it seems to be in order.

I thought it might have to do with type coercion like described by R-bloggers due to the use of the sapply function. but this does not seem to be right.

Help would be greatly appreciated, since I am still new to R and programming

my sessionInfo()

R version 3.2.5 (2016-04-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
[1] Biostrings_2.38.4   XVector_0.10.0      IRanges_2.4.8      
[4] S4Vectors_0.8.11    BiocGenerics_0.16.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.16.0 tools_3.2.5

and this is how I did construct my data:

seq_set <- matrix(1:2000, 1000, 2)
seq_set[, 1] <-
  sapply(seq_set[, 1], function(s)
    paste(sample(
      c('A', 'C', 'G', 'T'),
      size = ncol(motifs[[1]]),
      replace = T
    ), collapse = ''))
seq_set[, 2] <-
  sapply(seq_set[, 2], function(s)
    paste(sample(
      c('A', 'C', 'G', 'T'),
      size = ncol(motifs[[2]]),
      replace = T
    ), collapse = ''))

and these are the packages in my library:

AnnotationDbi                 Annotation Database Interface
Biobase                       Biobase: Base functions for Bioconductor
BiocGenerics                  S4 generic functions for Bioconductor
BiocInstaller                 Install/Update Bioconductor, CRAN, and github Packages
BiocParallel                  Bioconductor facilities for parallel evaluation
Biostrings                    String objects representing biological sequences, and
                              matching algorithms
bitops                        Bitwise Operations
BSgenome                      Infrastructure for Biostrings-based genome data packages and
                              support for efficient SNP representation
caTools                       Tools: moving window statistics, GIF, Base64, ROC AUC, etc.
CNEr                          CNE Detection and Visualization
DBI                           R Database Interface
DirichletMultinomial          Dirichlet-Multinomial Mixture Model Machine Learning for
                              Microbiome Data
futile.logger                 A Logging Utility for R
futile.options                Futile options management
GenomeInfoDb                  Utilities for manipulating chromosome and other 'seqname'
                              identifiers
GenomicAlignments             Representation and manipulation of short genomic alignments
GenomicRanges                 Representation and manipulation of genomic intervals and
                              variables defined along a genome
gtools                        Various R Programming Tools
IRanges                       Infrastructure for manipulating intervals on sequences
lambda.r                      Modeling Data with Functional Programming
Rcpp                          Seamless R and C++ Integration
RCurl                         General Network (HTTP/FTP/...) Client Interface for R
Rsamtools                     Binary alignment (BAM), FASTA, variant call (BCF), and tabix
                              file import
RSQLite                       SQLite Interface for R
rtracklayer                   R interface to genome browsers and their annotation tracks
S4Vectors                     S4 implementation of vectors and lists
seqLogo                       Sequence logos for DNA sequence alignments
snow                          Simple Network of Workstations
SummarizedExperiment          SummarizedExperiment container
TFBSTools                     Software Package for Transcription Factor Binding Site
                              (TFBS) Analysis
TFMPvalue                     Efficient and Accurate P-Value Computation for Position
                              Weight Matrices
XML                           Tools for Parsing and Generating XML Within R and S-Plus
XVector                       Representation and manpulation of external sequences
zlibbioc                      An R packaged zlib-1.2.5

Packages in library ‘/usr/lib/R/library’:

base                          The R Base Package
boot                          Bootstrap Functions (Originally by Angelo Canty for S)
class                         Functions for Classification
cluster                       "Finding Groups in Data": Cluster Analysis Extended
                              Rousseeuw et al.
codetools                     Code Analysis Tools for R
compiler                      The R Compiler Package
datasets                      The R Datasets Package
foreign                       Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat,
                              Weka, dBase, ...
graphics                      The R Graphics Package
grDevices                     The R Graphics Devices and Support for Colours and Fonts
grid                          The Grid Graphics Package
KernSmooth                    Functions for Kernel Smoothing Supporting Wand & Jones
                              (1995)
lattice                       Trellis Graphics for R
MASS                          Support Functions and Datasets for Venables and Ripley's
                              MASS
Matrix                        Sparse and Dense Matrix Classes and Methods
methods                       Formal Methods and Classes
mgcv                          Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness
                              Estimation
nlme                          Linear and Nonlinear Mixed Effects Models
nnet                          Feed-Forward Neural Networks and Multinomial Log-Linear
                              Models
parallel                      Support for Parallel computation in R
rpart                         Recursive Partitioning and Regression Trees
spatial                       Functions for Kriging and Point Pattern Analysis
splines                       Regression Spline Functions and Classes
stats                         The R Stats Package
stats4                        Statistical Functions using S4 Classes
survival                      Survival Analysis
tcltk                         Tcl/Tk Interface
tools                         Tools for Package Development
utils                         The R Utils Package

It's good to have sessionInfo like that, but also best to make the example fully reproducible (with `library()` calls, a small data set, etc.). See http://stackoverflow.com/a/28481250/ — Frank, Sep 13 '16 at 18:16
Very likely you want to use `paste` and not `cat`. Guess it should be `min.score<-paste('"', percentages[i], "%" , '"', sep = "")`. — nicola, Sep 13 '16 at 18:19
thank you nicola. I tried paste instead of cat. and the error is indeed gone but now I get a message for over 50 warnings. I'll work through this and come back. still, why do you think paste resolves the initial issue ? would be very nice to know. — piderotrema, Sep 13 '16 at 18:36

score 0 · Answer 1 · answered Sep 13 '16 at 19:13

nicola's comment did the trick.

this way:

seq_set_matches <- matrix(1:42, 21, 2)
percentages <- as.character(seq(0, 100, 5))
for (i in 1:length(percentages)) {
  seq_set_matches[i,1]<- sum(sapply(DNAStringSet(seq_set[, 1]), function(s)
    countPWM(
      motifs[[1]],
      reverseComplement(s),
      min.score = paste(percentages[i], "%" , sep = "")
    )))
}

works. dear nicola, I'd love to accept your help as an official answer if you like. thanks again.

type conversion - type coercion only in loop?

1 Answers1