1

I'm using "sva" package in R to determine the surrogate variables in my data set. I have a data frame with 60000 rows and 8 columns ( transcripts x samples). Each element in this matrix is a value of "TPM"(Transcript Per Factor, a numerical value) for corresponding transcription and sample. Here is the code I'm running:

   library(sva)
   library(Biobase)
   library(limma)

   data=read.table("mymatrix.txt")

   model  = model.matrix(~Sample_1+Sample_2,data=data)
   # NULL model
   model0 = model.matrix(~1,data=data) # no intercept

   # Run SVA
   svobj = sva(dat=data, mod=model, mod0=model0)

After running the "sav()" function, it gives me this error: Error in H %*% t(dat) : non-conformable arguments

Does any body have an idea what does that mean? it is an example of the .txt file:

                          Sample_1    Sample_2    Sample_3      Sample_4 
    ENSMUST00000060336.4 2.10642e-01 1.29483e-01 2.58036e-01 1.59276e-01
    ENSMUST00000060345.5 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00
    ENSMUST00000060348.2 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00
    ENSMUST00000060356.5 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00
    ENSMUST00000060357.8 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00

in the .txt files all the values are in the "double" format, but when I read the file using "read.table", they are converted to "e" format, so is this the problem?

in the text file:

                           Sample_1" "Sample_2"     "Sample_3"     "Sample_4"
    "ENSMUST00000000001.4" 18.7276    16.9755        12.138        20.0952 
    "ENSMUST00000000003.8" 0           0              0             0 
    "ENSMUST00000000010.8" 0           0              0             0 
    "ENSMUST00000000028.8" 0          0.427772       0.162408       0  
    "ENSMUST00000000033.6" 125.936    17.5017        21.3972        59.4235 
user1080814
  • 171
  • 3
  • 4
  • 13
  • Can you run the `traceback()` function after this error to see where the error occured? Also, could you provide the contents of `mymatrix.txt`? (If it's long, could you create an example with a few rows that reproduces the problem?) – David Robinson Sep 30 '14 at 01:27
  • sure, here is an example of the .txt file: – user1080814 Sep 30 '14 at 01:35
  • The problem is that your model matrices aren't model matrices at all: they should have as many rows *as your data has columns* (thus, in this case `model` and `model0` should both have 4 rows). The matrices are meant to represent known confounders, but having "Sample1 + Sample2" as your confounders doesn't make statistical or biological sense (what are 3 and 4 relative to that?) Can you elaborate on what your study design looks like? – David Robinson Sep 30 '14 at 02:32
  • Thank you so much for your help. Actually I tried to correct my code. Now I have 2 matrices ( `model` and `batch`), `model` matrix has 6000 rows and 8 columns.`batch` matrix has 8 rows and one column.the only column in the `batch` matrix has the name "age" which has two different value for each samples (old or young). Now I'm trying to build the `model.matrix` using these two matrix to feed them to the "sva()" command. any suggestion? – user1080814 Sep 30 '14 at 03:48
  • 1
    Have you tried `model1 = model.matrix(~age,data=batch)`, `model0 = model.matrix(~1,data=batch)`, then `svobj = sva(dat=data, mod=model1, mod0=model0)`? (Note that you probably don't want to put your 6000x8 expression data in a matrix called "model" since that'll get you confused between it and your actual model) – David Robinson Sep 30 '14 at 04:11
  • I tried this code : `data<-read.table("model_data.txt") batch – user1080814 Sep 30 '14 at 05:05
  • 1
    At this point we'll need to see your data. Try `str(x)`. – Roman Luštrik Sep 30 '14 at 06:36
  • I ran this code: `svobj<-sva(dat=as.matrix(data),mod=model,mod0=model0)' it worked without any error, but the problem is that `sva` could not find any surrogate variable and finished with this meaasge: No significant surrogate variables`, any suggestion? – user1080814 Sep 30 '14 at 07:21
  • 1
    @user1080814 That's a statistical issue rather than a programming one: it suggests that you might not have any surrogate variables in the data worth correcting for. (On the whole that is generally *good* news, assuming you used sva correctly). Incidentally, you should really look into [How to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for future questions. – David Robinson Sep 30 '14 at 11:06

0 Answers0