2

My data.frame looks like this

ID | test | test_result
1  |  B   |   10
2  |  A   |   9
3  |  A   |   11
4  |  C   |   7
5  |  F   |   5

And I want to get something like this:

test | test_reult_ID1 | test_result_ID2 | test_result_ID3 ...
 A   |   NA           |     9           |   11
 B   |   10           |     NA          |   NA

It works with reshape() to the wide format with only a few cases but with the whole data frame (about 23.000 ID´s) reshape () takes too long. Melt() and cast() do reshape the data but replace the values in test_result by the frequency of the test. Any other ideas how to manage this? Thanks!

Elisa
  • 215
  • 1
  • 3
  • 11

2 Answers2

6

dcast from the reshape2 package does this:

require(reshape2)
dcast(data, test ~ ID , value_var = 'test_result' )

#  test  1  2  3  4  5
#1    A NA  9 11 NA NA
#2    B 10 NA NA NA NA
#3    C NA NA NA  7 NA
#4    F NA NA NA NA  5
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Alex
  • 909
  • 1
  • 7
  • 12
  • I´ve just tried it with he whole data.frame and it gives me this error message: `Aggregation function missing: defaulting to length` and again only the frequencies instead of the values. With only a few rows however in works. Do you know why? – Elisa Nov 11 '11 at 15:16
  • 1
    @Elisa This will happen when your `dcast` arguments result in more than one value in each cell of the result. If this happens, some kind of aggregation needs to happen, and the default function is count. Do you perhaps have duplicate values in your data? Anyway, perhaps try `mean` as the aggregation function. – Andrie Nov 11 '11 at 22:27
  • @Andrie: the aggregation function stops the errors but apparently `mean` doesn´t work because: `argument is not numeric or logical: returning NA` Is there any aggregation function like "just return the values" ? – Elisa Nov 12 '11 at 10:59
  • Your problem is that there is more than one value to return, so you need to find a function that collapses multiple values into a single value. If your data is of class `character`, perhaps consider using `paste`? – Andrie Nov 12 '11 at 12:15
  • 4
    solved it: the problem was that one ID for some reason had three instead of two rows. the duplicated() solved that then. – Elisa Nov 12 '11 at 15:33
0

Another solution using reshape function in base R.

reshape(mydf, direction = 'wide', idvar = 'test', timevar = 'ID', 
  v.names = 'test_result', sep = "_")

EDIT. I see that you have already tried reshape and it took too long. Can you provide more details on your actual data?

Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • My original data has those three columns and about 23000 rows. There is the same ID for two rows each (one person has solved two tests, e.g A and F and therefore two results and two rows). Might that be the problem? – Elisa Nov 11 '11 at 15:52