2

I would like to do PCA on a dataframe that is in long form:

time1 id1 data11

time1 id2 data12

time2 id1 data21

etc.

Is there an easy way to do this or is the standard way to reshape it and then do princomp. My dataset is pretty large with roughly 40,000 times and 4,000 ids.

Community
  • 1
  • 1
Alex
  • 19,533
  • 37
  • 126
  • 195

1 Answers1

3

For such a simple reshaping I think all you need is

m <- matrix(mydata[,3],nrow=ntimes,byrow=TRUE)
princomp(m)

This should give you a ntimes by nIDs matrix to play with. It will be (potentially a lot) faster than reshape.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • great suggestion - thank you! i'll give that a shot but seems like it will work. – Alex Nov 18 '11 at 21:27
  • Faster, but also more dangerous. – hadley Nov 19 '11 at 03:29
  • Fair enough. The resulting matrix will be 610 Mb, though (`print(object.size(seq(1.6e8)),units="Mb")` [or I could just have multiplied 4 bytes by 1.6e8!] The OP might need all the speed they can get. I was going to run some benchmarks but I can't easily create such a big object on my laptop. – Ben Bolker Nov 19 '11 at 04:41