1

I have a large dataset (2.8m rows x 4 columns) in R that I'm trying to transpose. I was attempting to use the reshape2::cast function to do the transpose but it's running out of memory.

Question 1: is there a better way to do the transpose?

Question 2: I am attempting to chop the data set up into pieces, do the transpose on the pieces and then reassemble it. However, I'm running into an issue on the reassembly step where cbind requires I know in advance which columns I want to join on. is there a clever way around this issue?

bigtranspose<-function(dataset){
          n<-nrow(dataset) 
          i<-1
          while (i<=n){
              #take 10 rows at a time and do the transpose
              UB <- min(i+10, n)
              small<-dataset[i:UB,]
              smallmelt<-melt(small, id=c("memberID", "merchantID"))
              t<-dcast(smallmelt, memberID~merchantID, na.rm=TRUE)

              #stack the results together
              if ( !exists("finaldataset") ) 
                finaldataset<-t
              else
                finaldataset<-rbind(finaldataset,t)
              i <- i+10+1
          } 
        }

1 Answers1

0

You can just use t function to do transpose

mat <- matrix(1:(3e+06 * 4), ncol = 4)
dim(mat)
## [1] 3000000       4

tmat <- t(mat)
dim(tmat)
## [1]       4 3000000


# And it's fast
system.time(tmat <- t(mat))
##    user  system elapsed 
##    0.05    0.03    0.08 
CHP
  • 16,981
  • 4
  • 38
  • 57
  • Thanks, but I need to do a transpose "by" memberID. Can I do that with the t() function? In SAS it would be proc transpose data=foo ; by memberID; id=merchantID; var=logsum;run; – Scott Nelson Oct 18 '13 at 22:30
  • @ScottNelson Please check this [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). A good reproducible example will help others to tackle your question lot more easily. – CHP Oct 20 '13 at 08:06