6

I have a dataframe that looks something like this

NUM <- c("45", "45", "45", "45", "48", "50", "66", "66", "66", "68")
Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D")
Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4)

df1 <- data.frame(NUM,Type,Points)

df1:

+-----+------+--------+
| NUM | TYPE | Points |
+-----+------+--------+
|  45 | A    | 9.2    |
|  45 | F    | 60.8   |
|  45 | C    | 22.9   |
|  45 | B    | 1012.7 |
|  48 | D    | 18.7   |
|  50 | A    | 11.1   |
|  66 | E    | 67.2   |
|  66 | C    | 63.1   |
|  66 | F    | 16.7   |
|  65 | D    | 58.4   |
+-----+------+--------+

I am trying to obtain an output that takes the rows in type column to convert it to individual columns.

Desired Output:

+-----+----------+----------+----------+----------+----------+----------+
| NUM | Points.A | Points.B | Points.C | Points.D | Points.E | Points.F |
+-----+----------+----------+----------+----------+----------+----------+
|  45 | 9.2      | 1012.7   | 22.9     | N/A      | N/A      | 60.8     |
|  48 | N/A      | N/A      | N/A      | 18.7     | N/A      | N/A      |
|  50 | 11.1     | N/A      | N/A      | N/A      | N/A      | N/A      |
|  66 | N/A      | N/A      | 63.1     | N/A      | 67.2     | 16.7     |
|  65 | N/A      | N/A      | N/A      | N/A      | 58.4     | N/A      |
+-----+----------+----------+----------+----------+----------+----------+

I tried using melt(df1) but doing it wrongly since the values in the rows are the NUM values rather than points. Kindly let me know how I could go about solving this.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Sharath
  • 2,225
  • 3
  • 24
  • 37

2 Answers2

11

You are looking for a basic "long" to "wide" reshaping process.

In base R, you can use the notorious reshape. For this type of data, the syntax is quite straightforward:

reshape(df1, direction = "wide", idvar = "NUM", timevar = "Type")
#    NUM Points.A Points.F Points.C Points.B Points.D Points.E
# 1   45      9.2     60.8     22.9   1012.7       NA       NA
# 5   48       NA       NA       NA       NA     18.7       NA
# 6   50     11.1       NA       NA       NA       NA       NA
# 7   66       NA     16.7     63.1       NA       NA     67.2
# 10  68       NA       NA       NA       NA     58.4       NA

You can also use the "tidyr" package, for several functions just wrap reshape2 but uses different syntax. In this case, the syntax would be:

> library(tidyr)
> spread(df1, Type, Points)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • 1
    Works like charm!! Thanks Ananda. I just implemented both reshape & spread functions on my dataset and it looks like they pretty much take up the same amount of time to generate results. Do you know which one performs better? – Sharath Apr 21 '15 at 13:49
  • 3
    @Sharath, Depends on the size of your dataset. In general, you're going to get a lot more mileage out of the development version of "data.table" which has fast melting and casting capabilities, and goes beyond what "reshape2" can do. That said, I generally don't like to provide answers using development versions of packages.... – A5C1D2H2I1M1N2O1R2T1 Apr 21 '15 at 13:51
6

You can try dcast

library(reshape2)
dcast(df1, NUM~paste0('Points.',Type), value.var='Points')

Or you can convert to data.table and use dcast from the data.table. It would be faster

library(data.table)#v1.9.5+
dcast(setDT(df1), NUM~paste0('Points.',Type), value.var='Points')
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Works superbly. is dcast faster than spread on large datasets? – Sharath Apr 21 '15 at 13:50
  • 3
    @Sharath For larger datasets, I would convert the 'data.frame to 'data.table' and use `dcast` from data.table. It would be very fast – akrun Apr 21 '15 at 13:51
  • 1
    Thanks for the suggestion. Looks like even Ananda proposed the same. I will convert it to data.table and use dcast for much larger datasets then. – Sharath Apr 21 '15 at 13:58