Reshape data for values in one column

Question

My data.frame looks like this

ID | test | test_result
1  |  B   |   10
2  |  A   |   9
3  |  A   |   11
4  |  C   |   7
5  |  F   |   5

And I want to get something like this:

test | test_reult_ID1 | test_result_ID2 | test_result_ID3 ...
 A   |   NA           |     9           |   11
 B   |   10           |     NA          |   NA

It works with reshape() to the wide format with only a few cases but with the whole data frame (about 23.000 ID´s) reshape () takes too long. Melt() and cast() do reshape the data but replace the values in test_result by the frequency of the test. Any other ideas how to manage this? Thanks!

http://stackoverflow.com/a/9617424/210673 now has a list of the various ways to do this. — Aaron left Stack Overflow, Mar 23 '12 at 16:14

score 6 · Accepted Answer · edited Feb 21 '17 at 07:31

6

dcast from the reshape2 package does this:

require(reshape2)
dcast(data, test ~ ID , value_var = 'test_result' )

#  test  1  2  3  4  5
#1    A NA  9 11 NA NA
#2    B 10 NA NA NA NA
#3    C NA NA NA  7 NA
#4    F NA NA NA NA  5

edited Feb 21 '17 at 07:31

David Arenburg

91,361
17
137
196

answered Nov 11 '11 at 13:29

Alex

909
1
7
12

I´ve just tried it with he whole data.frame and it gives me this error message: `Aggregation function missing: defaulting to length` and again only the frequencies instead of the values. With only a few rows however in works. Do you know why? – Elisa Nov 11 '11 at 15:16
1

@Elisa This will happen when your `dcast` arguments result in more than one value in each cell of the result. If this happens, some kind of aggregation needs to happen, and the default function is count. Do you perhaps have duplicate values in your data? Anyway, perhaps try `mean` as the aggregation function. – Andrie Nov 11 '11 at 22:27
@Andrie: the aggregation function stops the errors but apparently `mean` doesn´t work because: `argument is not numeric or logical: returning NA` Is there any aggregation function like "just return the values" ? – Elisa Nov 12 '11 at 10:59
Your problem is that there is more than one value to return, so you need to find a function that collapses multiple values into a single value. If your data is of class `character`, perhaps consider using `paste`? – Andrie Nov 12 '11 at 12:15
4

solved it: the problem was that one ID for some reason had three instead of two rows. the duplicated() solved that then. – Elisa Nov 12 '11 at 15:33

score 0 · Answer 2 · answered Nov 11 '11 at 15:32

0

Another solution using reshape function in base R.

reshape(mydf, direction = 'wide', idvar = 'test', timevar = 'ID', 
  v.names = 'test_result', sep = "_")

EDIT. I see that you have already tried reshape and it took too long. Can you provide more details on your actual data?

answered Nov 11 '11 at 15:32

Ramnath

54,439
16
125
152

My original data has those three columns and about 23000 rows. There is the same ID for two rows each (one person has solved two tests, e.g A and F and therefore two results and two rows). Might that be the problem? – Elisa Nov 11 '11 at 15:52

Reshape data for values in one column

2 Answers2

Linked