0

I am attempting to perform miRNA correlation using the package anamiR in R (by way of Rstudio). The script I am using is:

library(anamiR)

mrna1 = read.csv("D:\\file1.csv", row.names = 1, header= TRUE)
mrna <- as.matrix(mrna1)
rm(mrna1) 
mirna1 = read.csv("D:\\file2.csv", row.names = 1, header= TRUE)
mirna <- as.matrix(mirna1)
rm(mirna1) 
pheno.mirna1 = read.csv("D:\\file3.csv", row.names = 1, header= TRUE)
pheno.mirna <- as.matrix(pheno.mirna1)
rm(pheno.mirna1) 
pheno.mrna1 = read.csv("D:\\file4.csv", row.names = 1, header= TRUE)
pheno.mrna <- as.matrix(pheno.mrna1)
rm(pheno.mrna1)

mrna_se <- SummarizedExperiment::SummarizedExperiment(
  assays = S4Vectors::SimpleList(counts=mrna),
  colData = pheno.mrna)

mirna_se <- SummarizedExperiment::SummarizedExperiment(
  assays = S4Vectors::SimpleList(counts=mirna),
  colData = pheno.mirna)

mrna_d <- differExp_discrete(se = mrna_se,
                             class = "ER", method = "DESeq",
                             t_test.var = FALSE, log2 = FALSE,
                             p_value.cutoff = 0.05,  logratio = 0.5
)

mirna_d <- differExp_discrete(se = mirna_se,
                              class = "ER", method = "DESeq",
                              t_test.var = FALSE, log2 = FALSE,
                              p_value.cutoff = 0.05,  logratio = 0.5
)

When I reach (This is the code that generates the error).

mrna_d <- differExp_discrete(se = mrna_se,
                             class = "ER", method = "DESeq",
                             t_test.var = FALSE, log2 = FALSE,
                             p_value.cutoff = 0.05,  logratio = 0.5
)

mirna_d <- differExp_discrete(se = mirna_se,
                              class = "ER", method = "DESeq",
                              t_test.var = FALSE, log2 = FALSE,
                              p_value.cutoff = 0.05,  logratio = 0.5
)

I get

Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame
In addition: Warning message:
In DESeq2::DESeqDataSet(se, design = tmp) :
  some variables in design formula are characters, converting to factors

My sessionInfo is:

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] anamiR_1.13.0

loaded via a namespace (and not attached):
  [1] backports_1.2.1             Hmisc_4.5-0                 BiocFileCache_1.10.2        plyr_1.8.6                 
  [5] splines_3.6.0               BiocParallel_1.20.1         AlgDesign_1.2.0             GenomeInfoDb_1.22.1        
  [9] ggplot2_3.3.3               digest_0.6.27               foreach_1.5.1               htmltools_0.5.1.1          
 [13] fansi_0.4.2                 magrittr_2.0.1              checkmate_2.0.0             memoise_2.0.0              
 [17] cluster_2.1.1               limma_3.42.2                readr_1.4.0                 Biostrings_2.54.0          
 [21] annotate_1.64.0             matrixStats_0.58.0          askpass_1.1                 siggenes_1.60.0            
 [25] prettyunits_1.1.1           jpeg_0.1-8.1                colorspace_2.0-0            rappdirs_0.3.3             
 [29] blob_1.2.1                  haven_2.3.1                 xfun_0.22                   dplyr_1.0.5                
 [33] crayon_1.4.1                RCurl_1.98-1.3              graph_1.64.0                genefilter_1.68.0          
 [37] GEOquery_2.54.1             survival_3.2-10             iterators_1.0.13            glue_1.4.2                 
 [41] gtable_0.3.0                lumi_2.38.0                 zlibbioc_1.32.0             XVector_0.26.0             
 [45] DelayedArray_0.12.3         questionr_0.7.4             Rhdf5lib_1.8.0              BiocGenerics_0.32.0        
 [49] HDF5Array_1.14.4            scales_1.1.1                rngtools_1.5                DBI_1.1.1                  
 [53] miniUI_0.1.1.1              Rcpp_1.0.6                  progress_1.2.2              xtable_1.8-4               
 [57] htmlTable_2.1.0             gage_2.36.0                 bumphunter_1.28.0           foreign_0.8-71             
 [61] bit_4.0.4                   mclust_5.4.7                preprocessCore_1.48.0       Formula_1.2-4              
 [65] stats4_3.6.0                htmlwidgets_1.5.3           httr_1.4.2                  gplots_3.1.1               
 [69] RColorBrewer_1.1-2          ellipsis_0.3.1              pkgconfig_2.0.3             reshape_0.8.8              
 [73] XML_3.99-0.3                dbplyr_2.1.0                nnet_7.3-15                 locfit_1.5-9.4             
 [77] utf8_1.2.1                  tidyselect_1.1.0            rlang_0.4.10                later_1.1.0.1              
 [81] AnnotationDbi_1.48.0        munsell_0.5.0               tools_3.6.0                 cachem_1.0.4               
 [85] generics_0.1.0              RSQLite_2.2.5               stringr_1.4.0               fastmap_1.1.0              
 [89] knitr_1.31                  bit64_4.0.5                 beanplot_1.2                caTools_1.18.2             
 [93] methylumi_2.32.0            scrime_1.3.5                purrr_0.3.4                 KEGGREST_1.26.1            
 [97] doRNG_1.8.2                 nlme_3.1-152                mime_0.10                   nor1mix_1.3-0              
[101] xml2_1.3.2                  biomaRt_2.42.1              compiler_3.6.0              rstudioapi_0.13            
[105] curl_4.3                    png_0.1-7                   affyio_1.56.0               klaR_0.6-15                
[109] tibble_3.1.0                geneplotter_1.64.0          stringi_1.5.3               highr_0.8                  
[113] GenomicFeatures_1.38.2      minfi_1.32.0                forcats_0.5.1               lattice_0.20-41            
[117] Matrix_1.3-2                multtest_2.42.0             vctrs_0.3.7                 pillar_1.5.1               
[121] lifecycle_1.0.0             BiocManager_1.30.12         combinat_0.0-8              data.table_1.14.0          
[125] bitops_1.0-6                rtracklayer_1.46.0          httpuv_1.5.5                agricolae_1.3-3            
[129] GenomicRanges_1.38.0        affy_1.64.0                 R6_2.5.0                    latticeExtra_0.6-29        
[133] RMySQL_0.10.21              promises_1.2.0.1            KernSmooth_2.23-18          gridExtra_2.3              
[137] nleqslv_3.3.2               IRanges_2.20.2              codetools_0.2-18            MASS_7.3-53.1              
[141] gtools_3.8.2                assertthat_0.2.1            rhdf5_2.30.1                Sum
[145] openssl_1.4.3               DESeq2_1.26.0               GenomicAlignments_1.22.1    Rsamtools_2.2.3            
[149] S4Vectors_0.24.4            GenomeInfoDbData_1.2.2      mgcv_1.8-34                 parallel_3.6.0             
[153] hms_1.0.0                   quadprog_1.5-8              grid_3.6.0                  rpart_4.1-15               
[157] labelled_2.8.0              tidyr_1.1.3                 base64_2.0                  DelayedMatrixStats_1.8.0   
[161] illuminaio_0.28.0           Biobase_2.46.0              shiny_1.6.0                 base64enc_0.1-3        

I can change R versions but that really does not help. I have identified the problem as both mrna_se@colData and miRNA@colData not being dataframes:

> is.data.frame(mirna_se@colData)
[1] FALSE
> is.data.frame(mrna_se@colData)
[1] FALSE

So how can I convert these objects within the overall s4 object to dataframes in order that DESEQ2 can use them to produce differential expression data? This is driving me insane.

Also before anyone asks:

> packageVersion("DESeq2")
[1] ‘1.26.0’

In response to the comment I changed the code as below and I get the below error.

mrna_se <- SummarizedExperiment::SummarizedExperiment(
  assays = S4Vectors::SimpleList(counts=mrna),
  colData = as.data.frame(pheno.mrna))
  it appears that the last variable in the design formula, 'ER',
  has a factor level, 'control', which is not the reference level. we recommend
  to use factor(...,levels=...) or relevel() to set this as the reference level
  before proceeding. for more information, please see the 'Note on factor levels'
  in vignette('DESeq2').
Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame

Further Edits:

If you try to just read the csv files in without doing it as a matrix you get:

> library(anamiR)
> 
> mrna = read.csv("D:\\file1.csv", row.names = 1, header= TRUE)
> mirna = read.csv("D:\\file2.csv", row.names = 1, header= TRUE)
> pheno.mirna = read.csv("D:\\file3.csv", row.names = 1, header= TRUE)
> pheno.mrna = read.csv("D:\\file4.csv", row.names = 1, header= TRUE)
> 
> mrna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mrna),
+   colData = as.data.frame(pheno.mrna))
Error in all_dims[, 1L] : incorrect number of dimensions
> 
> mirna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mirna),
+   colData = pheno.mirna)
Error in all_dims[, 1L] : incorrect number of dimensions
> 
> mrna_d <- differExp_discrete(se = mrna_se,
+                              class = "ER", method = "DESeq",
+                              t_test.var = FALSE, log2 = FALSE,
+                              p_value.cutoff = 0.05,  logratio = 0.5
+ )
Error in SummarizedExperiment::assays(se) : object 'mrna_se' not found
> 
> mirna_d <- differExp_discrete(se = mirna_se,
+                               class = "ER", method = "DESeq",
+                               t_test.var = FALSE, log2 = FALSE,
+                               p_value.cutoff = 0.05,  logratio = 0.5
+ )
Error in SummarizedExperiment::assays(se) : object 'mirna_se' not found

Traceback on my original error is (showing one file using the proposed as.data.frame solution and the other one using my original matrix loading):

> library(anamiR)
> 
> mrna1 = read.csv("D:\\file1.csv.csv", row.names = 1, header= TRUE)
> mrna <- as.matrix(mrna1)
> rm(mrna1) 
> mirna1 = read.csv("D:\\file2.csv.csv", row.names = 1, header= TRUE)
> mirna <- as.matrix(mirna1)
> rm(mirna1) 
> pheno.mirna1 = read.csv("D:\\file3.csv.csv", row.names = 1, header= TRUE)
> pheno.mirna <- as.matrix(pheno.mirna1)
> rm(pheno.mirna1) 
> pheno.mrna1 = read.csv("D:\\file4.csv.csv", row.names = 1, header= TRUE)
> pheno.mrna <- as.matrix(pheno.mrna1)
> rm(pheno.mrna1)
> 
> mrna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mrna),
+   colData = as.data.frame(pheno.mrna))
> 
> mirna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mirna),
+   colData = pheno.mirna)
> 
> mrna_d <- differExp_discrete(se = mrna_se,
+                              class = "ER", method = "DESeq",
+                              t_test.var = FALSE, log2 = FALSE,
+                              p_value.cutoff = 0.05,  logratio = 0.5
+ )
  it appears that the last variable in the design formula, 'ER',
  has a factor level, 'control', which is not the reference level. we recommend
  to use factor(...,levels=...) or relevel() to set this as the reference level
  before proceeding. for more information, please see the 'Note on factor levels'
  in vignette('DESeq2').
Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame
> 
> traceback()
6: stop("data must be a data.frame")
5: model.matrix.formula(design(object), colData(object))
4: stats::model.matrix(design(object), colData(object))
3: designAndArgChecker(object, betaPrior)
2: DESeq2::DESeq(dds)
1: differExp_discrete(se = mrna_se, class = "ER", method = "DESeq", 
       t_test.var = FALSE, log2 = FALSE, p_value.cutoff = 0.05, 
       logratio = 0.5)
> 
> mirna_d <- differExp_discrete(se = mirna_se,
+                               class = "ER", method = "DESeq",
+                               t_test.var = FALSE, log2 = FALSE,
+                               p_value.cutoff = 0.05,  logratio = 0.5
+ )
Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame
In addition: Warning message:
In DESeq2::DESeqDataSet(se, design = tmp) :
  some variables in design formula are characters, converting to factors
> 
> traceback()
6: stop("data must be a data.frame")
5: model.matrix.formula(design(object), colData(object))
4: stats::model.matrix(design(object), colData(object))
3: designAndArgChecker(object, betaPrior)
2: DESeq2::DESeq(dds)
1: differExp_discrete(se = mirna_se, class = "ER", method = "DESeq", 
       t_test.var = FALSE, log2 = FALSE, p_value.cutoff = 0.05, 
       logratio = 0.5)

Traceback on the new one is:

> library(anamiR)
> 
> mrna = read.csv("D:\\file1.csv", row.names = 1, header= TRUE)
> mirna = read.csv("D:\\file2.csv", row.names = 1, header= TRUE)
> pheno.mirna = read.csv("D:\\file3.csv", row.names = 1, header= TRUE)
> pheno.mrna = read.csv("D:\\file4.csv", row.names = 1, header= TRUE)
> 
> mrna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mrna),
+   colData = as.data.frame(pheno.mrna))
Error in all_dims[, 1L] : incorrect number of dimensions
> traceback()
7: method(object)
6: validityMethod(as(object, superClass))
5: isTRUE(x)
4: anyStrings(validityMethod(as(object, superClass)))
3: validObject(ans)
2: Assays(assays)
1: SummarizedExperiment::SummarizedExperiment(assays = S4Vectors::SimpleList(counts = mrna), 
       colData = as.data.frame(pheno.mrna))
> 
> mirna_se <- SummarizedExperiment::SummarizedExperiment(
+   assays = S4Vectors::SimpleList(counts=mirna),
+   colData = pheno.mirna)
Error in all_dims[, 1L] : incorrect number of dimensions
> traceback()
7: method(object)
6: validityMethod(as(object, superClass))
5: isTRUE(x)
4: anyStrings(validityMethod(as(object, superClass)))
3: validObject(ans)
2: Assays(assays)
1: SummarizedExperiment::SummarizedExperiment(assays = S4Vectors::SimpleList(counts = mirna), 
       colData = pheno.mirna)

Latest Edits (12/4/21)

So in response to comments I am now loading my data files as below:

mrna <- as.matrix(read.csv("D:\\CorrelationDataProcessing\\TRAMP30w\\mrnaTRAMP_Mut30w_v_WT30w_normcounts.csv", row.names = 1, header= TRUE))
mirna <- as.matrix(read.csv("D:\\CorrelationDataProcessing\\TRAMP30w\\mirnaTRAMP_Mut30w_vs_WT30w_normcounts.csv", row.names = 1, header= TRUE))
pheno.mirna = read.csv("D:\\CorrelationDataProcessing\\TRAMP30w\\mirnapheno.csv", row.names = 1, header= TRUE)
pheno.mrna = read.csv("D:\\CorrelationDataProcessing\\TRAMP30w\\mrnapheno.csv", row.names = 1, header= TRUE)

This results in:

> mrna_d <- differExp_discrete(se = mrna_se,
+                              class = "ER", method = "DESeq",
+                              t_test.var = FALSE, log2 = FALSE,
+                              p_value.cutoff = 0.05,  logratio = 0.5
+ )
  it appears that the last variable in the design formula, 'ER',
  has a factor level, 'control', which is not the reference level. we recommend
  to use factor(...,levels=...) or relevel() to set this as the reference level
  before proceeding. for more information, please see the 'Note on factor levels'
  in vignette('DESeq2').
Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame
> 
> mirna_d <- differExp_discrete(se = mirna_se,
+                               class = "ER", method = "DESeq",
+                               t_test.var = FALSE, log2 = FALSE,
+                               p_value.cutoff = 0.05,  logratio = 0.5
+ )
  it appears that the last variable in the design formula, 'ER',
  has a factor level, 'control', which is not the reference level. we recommend
  to use factor(...,levels=...) or relevel() to set this as the reference level
  before proceeding. for more information, please see the 'Note on factor levels'
  in vignette('DESeq2').
Error in model.matrix.formula(design(object), colData(object)) : 
  data must be a data.frame
scp4151
  • 1
  • 2
  • can you trim it down to the relevant bits of code that shows the error? – StupidWolf Apr 06 '21 at 07:54
  • `pheno.mrna` seems to be a matrix. what is preventing you from doing `SummarizedExperiment( .. ,colData = as.data.frame(pheno.mrna)` ? – StupidWolf Apr 06 '21 at 07:55
  • Sorry but this does something different. – scp4151 Apr 06 '21 at 08:13
  • I made some edits. – scp4151 Apr 06 '21 at 08:22
  • 1
    Remove the `as.matrix` calls in your code for the column data. — As a point on programming style, don’t create temporary variables and `rm` them afterwards. If you want to limit the scope of your variables (good idea!), use `local` evaluation or functions. – Konrad Rudolph Apr 06 '21 at 08:27
  • could you post a `traceback()` from your error? (Or set `options(error=recover)` and show us the list of function calls?) That will help us work out exactly where problems are likely to be happening. – JDL Apr 06 '21 at 10:33
  • You misunderstood my comment. You need to remove `as.matrix` *only for the column data*. By contrast, the *count data* needs to be a matrix; that’s why you’re now getting a new error message. Furthermore, check the spelling of your variable names, you’ve now got a new error because you’re referring to a nonexistent variable name. – Konrad Rudolph Apr 07 '21 at 08:06
  • @Konrad Rudolph I removed the ```as.matrix``` in reference to the pheno.mirna/mrna files and get ``` it appears that the last variable in the design formula, 'ER', has a factor level, 'control', which is not the reference level. we recommend to use factor(...,levels=...) or relevel() to set this as the reference level before proceeding. for more information, please see the 'Note on factor levels' in vignette('DESeq2').``` ```Error in model.matrix.formula(design(object), colData(object)) : data must be a data.frame```. – scp4151 Apr 12 '21 at 02:10
  • Please provide a [MWE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the current form nobody can help because the problem is most likely from your input data. – Mitra Apr 14 '21 at 17:04
  • Besides, this package is outdated and seems that [it isn't maintained](https://github.com/AllenTiTaiWang/anamiR/issues/2) anymore. – Mitra Apr 14 '21 at 17:09
  • Indeed. We tried and I am just writing my own program to do this now in Python so I will close the question. – scp4151 Apr 19 '21 at 03:30

1 Answers1

0

Unfortuntely it appears as though the package has issues and is being deprecated anyway. At this point I would suggest that anyone wanting to undertake miRNA correlation ignore both anaMIR and mirCOMB.

scp4151
  • 1
  • 2