6

I have a dataframe/matrix of equal rows and columns. I want to extract only the upper or lower triangle.

x<-data.frame(matrix(1:25,nrow=5))
colnames(x)<-LETTERS[1:5]
rownames(x)<-LETTERS[1:5]

x[upper.tri(x,diag=F)]

From this result, it is not possible to say what combination of column and row the value came from. So, I would like to have the row and column attributes in the results. Something like this:

Col Row Val
B   A   6
C   A   11
C   B   12
...

I need to do this for a large correlation matrix. Thanks.

mindlessgreen
  • 11,059
  • 16
  • 68
  • 113
  • Was one of the below solutions useful? If an answer does solve your problem you may want to *consider* upvoting and/or marking it as accepted to show the question has been answered, by ticking the little green check mark next to the suitable answer. You are **not** obliged to do this, but it helps keep the site clean of unanswered questions and rewards those who take the time to solve your problem. – Simon O'Hanlon Aug 13 '13 at 08:48

4 Answers4

6

I'd just use which with arr.ind = TRUE like this:

ind <- which( upper.tri(x,diag=F) , arr.ind = TRUE )

data.frame( col = dimnames(x)[[2]][ind[,2]] ,
            row = dimnames(x)[[1]][ind[,1]] ,
            val = x[ ind ] )

   col row val
1    B   A   6
2    C   A  11
3    C   B  12
4    D   A  16
5    D   B  17
6    D   C  18
7    E   A  21
8    E   B  22
9    E   C  23
10   E   D  24
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
4

First, to make things unambiguous, I change

colnames(x) <- LETTERS[6:10]

Use expand.grid to get the row and column names like this

rowCol <- expand.grid(rownames(x), colnames(x))

To get the correct rows from this data frame, take

labs <- rowCol[as.vector(upper.tri(x,diag=F)),]
df <- cbind(labs, x[upper.tri(x,diag=F)])
colnames(df) <- c("Row","Col","Val")
df[,c(2,1,3)]
##    Col Row Val
## 6    G   A   6
## 11   H   A  11
## ...
James Pringle
  • 1,079
  • 6
  • 15
  • `expand.grid` is too slow. Try `data.table::CJ` instead: `rowCol <- data.table::CJ(rownames(x), colnames(x), sorted = F)` – Feng Tian Nov 22 '19 at 07:49
1

... this might be a solution

nam <-apply(ind, 2, function(y, x) rownames(x)[c(y)], x=x)   
cbind(nam, x[upper.tri(x,diag=F)])

hth

holzben
  • 1,459
  • 16
  • 24
  • Thanks. I just found the solution here.. http://stackoverflow.com/questions/7074246/show-correlations-as-an-ordered-list-not-as-a-large-matrix – mindlessgreen Aug 08 '13 at 13:53
0

The lower triangle is defined with the expression "column index is not greater than the row index". This code gives the lower triangle (or upper by switching the > operator) a value of 0. Use "" in place of 0 to keep the triangle.

x[!(col(x) > index(x))] <- 0

To produce a data set as in the original post, I would use reshape2::melt and dplyr::(filter, select) functions.

First create an id variable to melt on.

x$id <- rownames(x)

Then,

melt(x, id = "id") %>%
 filter(value > 0 ) %>%
   select(Col = variable, Row = id, Val = value)

   Col Row Val
1    B   A   6
2    C   A  11
3    C   B  12
4    D   A  16
5    D   B  17
6    D   C  18
7    E   A  21
8    E   B  22
9    E   C  23
10   E   D  24