0

I am rather new to R, so I would be grateful if anyone could help me :)

I have a large matrices, for example: matrix and a vector of genes. My task is to search the matrix row by row and compile pairs of genes with mutations (on the matrix is D707H) with the rest of the genes contained in the vector and add it to a new matrix. I tried do this with loops but i have no idea how to write it correctly. For this matrix it should look sth like this:

    PR.02.1431    
    NBN BRCA1
    NBN BRCA2
    NBN CHEK2
    NBN ELAC2
    NBN MSR1
    NBN PARP1
    NBN RNASEL

Now i have sth like this: my idea

"a" is my initial matrix.

Can anyone point me in the right direction? :)

  • Please make a reproducible example. See this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Images are not the best way to explain what you need. Also include the desired result. – Pierre Lapointe Apr 15 '17 at 18:14
  • use the `gather()` function of the `dplyr` package, and filter some more. – knb Apr 15 '17 at 18:33
  • 1
    I believe you mean `tidyr` package. – student Apr 15 '17 at 18:44
  • Please do not post code or data as images. It's fairly simple to use `dput(x)`, copy, paste, and indent (`ctrl-k` in stackoverflow question editor). I'm not about to transcribe data or code from an image into a console to test it out. – r2evans Apr 16 '17 at 04:02

2 Answers2

0

Perhaps what you want/need is which(..., arr.ind = TRUE).

Some sample data, for demonstration:

set.seed(2)
n <- 10
mtx <- array(NA, dim = c(n, n))
dimnames(mtx) <- list(letters[1:n], LETTERS[1:n])
mtx[sample(n*n, size = 4)] <- paste0("x", 1:4)
mtx
#   A  B    C  D  E  F    G    H  I  J 
# a NA NA   NA NA NA NA   NA   NA NA NA
# b NA NA   NA NA NA NA   NA   NA NA NA
# c NA NA   NA NA NA NA   NA   NA NA NA
# d NA NA   NA NA NA NA   NA   NA NA NA
# e NA NA   NA NA NA NA   NA   NA NA NA
# f NA NA   NA NA NA NA   NA   NA NA NA
# g NA "x4" NA NA NA "x3" NA   NA NA NA
# h NA NA   NA NA NA NA   NA   NA NA NA
# i NA "x1" NA NA NA NA   NA   NA NA NA
# j NA NA   NA NA NA NA   "x2" NA NA NA

In your case, it appears that you want anything that is not an NA or NaN. You might try:

which(! is.na(mtx) & ! is.nan(mtx))
# [1] 17 19 57 70

but that isn't always intuitive when retrieving the row/column pairs (genes, I think?). Try instead:

ind <- which(! is.na(mtx) & ! is.nan(mtx), arr.ind = TRUE)
ind
#   row col
# g   7   2
# i   9   2
# g   7   6
# j  10   7

How to use this: the integers are row and column indices, respectively. Assuming your matrix is using row names and column names, you can retrieve the row names with:

rownames(mtx)[ ind[,"row"] ]
# [1] "g" "i" "g" "j"

(An astute reader might suggest I use rownames(ind) instead. It certainly works!) Similarly for the colnames and "col".

Interestingly enough, even though ind is a matrix itself, you can subset mtx fairly easily with:

mtx[ind]
# [1] "x4" "x1" "x3" "x2"

Combining all three together, you might be able to use:

data.frame(
  gene1 = rownames(mtx)[ ind[,"row"] ],
  gene2 = colnames(mtx)[ ind[,"col"] ],
  val = mtx[ind]
)
#   gene1 gene2 val
# 1     g     B  x4
# 2     i     B  x1
# 3     g     F  x3
# 4     j     G  x2
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • I tried this method but with no effect... I have an error: _default method not implemented for type 'list'_ do you know what to do with it? –  Apr 16 '17 at 15:25
  • Evidently your matrix is not a matrix. I can't help without seeing more of your data than a screenshot. – r2evans Apr 16 '17 at 15:32
  • It appears after using _which_ function. –  Apr 16 '17 at 15:33
0

I know where my misteke was, now i have matrix. Analyzing your code it works good, but that's not exactly what I want to do. a, b, c, d etc. are organisms and row names are genes (A, B, C, D etc.). I have to cobine pairs of genes where one of it (in the same column) has sth else than NA value. For example if gene A has value=4 in column a I have to have:

   gene1 gene2
a    A     B
a    A     C
a    A     D
a    A     E   

I tried in this way but number of elements do not match and i do not know how to solve this.

ind= which(! is.na(a) & ! is.nan(a), arr.ind = TRUE)
ind1=which(macierz==1,arr.ind = TRUE)
ramka= data.frame(
  kolumna = rownames(a)[ ind[,"row"] ],
  gene1 = colnames(a)[ ind[,"col"] ],
  gene2 = colnames(a)[ind1[,"col"]],
  #val = macierz[ind]
)

Do you know how to do this in R?

  • This really isn't an answer, it seems like a clarification (or addition) on your question. If so, please place it there (and remove this answer). If I'm mis-reading this and it is resolved with this code, then you can "accept" it yourself as self-answered. – r2evans Jan 19 '18 at 18:29