-4

I have three dataframes A,B and C.

A has 18000 rows and 18000 columns and B has 150000 rows and 5 cols.

I want to fill elements of A by B.

the loop take a long time. how can I run this loop faster?

example of A

Entrez_Gene_Id 2324 34345 4345 1234 3453
1 Entrez_Gene_Id    0     0    0    0    0
2          23040    0     0    0    0    0
3           7249    0     0    0    0    0
4          64478    0     0    0    0    0
5           4928    0     0    0    0    0
6          58191    0     0    0    0    0

example of B

  head(B)
  V1 Gene1 Gene2      weight   newWeight
1  1  4171  4172  2.01676494 0.020420929
2  2  2237  5111 1.933298567 0.015300857
3  4   506   509 2.439170425 0.020577243
4  7  6635  6636 2.255316779 0.081088975
5  8  6133  6210 3.427969232 0.021132906
6 10 23521  6217 1.607247743 0.027792961   

and this is my code :

B<- data.frame(lapply(C, as.character), stringsAsFactors=FALSE)

for(i in 1:nrow(B)){
  Rname=B[i,2]
  Cname=B[i,3]
  A[Rname,Cname]=B[i,5]
  print(i)
}
knifer
  • 9
  • 5

1 Answers1

1

It seems as though you are trying to fill a full matrix with a matrix in sparse notation. You can use the dgCMatrix class from the Matrix package to do this:

library(Matrix)
b_mat <- sparseMatrix(i=B[,2],j=B[,3],x=B[,5])

This leaves the Matrix in sparse format. To convert to 18,000 x 18,000 form:

as.data.frame(as.matrix(b_mat))

EDIT: I would suggest leaving the as.data.frame call out here, as the matrix would be easier to work with considering the number of columns you have

Chris
  • 6,302
  • 1
  • 27
  • 54
  • Interesting package. Can you do a time comparison to `base R` methods? – Pierre L Feb 08 '16 at 14:44
  • @PierreLafortune It doesn't get more efficient than package Matrix when working with sparse matrices. – Roland Feb 08 '16 at 15:27
  • @PierreLafortune Matrix is a 'recommended' package expected to be present with every R installation. – Dirk Eddelbuettel Feb 08 '16 at 15:27
  • @knifer `B$gene1 <- factor(B$gene1); B$gene2 <- factor(B$gene2); sparseMatrix(i=as.integer(B$gene1),j=as.integer(B$gene1),x=B[,5], dimnames = list(levels(B$gene1), levels(B$gene2)))` – Roland Feb 08 '16 at 15:34