3
EID   Year Performance_rating
E100  2013                  4
E100  2014                  1
E200  2012                  4
E200  2013                  5
E200  2014                  2
E200  2015                  4

The way i need the data for the modeling purpose is non duplicated EID with there performance rating for each year in separate columns (Note some have 2 years of data and some 3 and some 4 based on there joining date)

EID Performance_rating_2012 Performance_rating_2013 Performance_rating_2014 Performance_rating_2015
E100                     NA                       4                       1                      NA
E200                      4                       5                       2                       4 

I tried multiple methods solving this but failed so posting here ,any answers would be much appreciated

Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112

2 Answers2

3

The package tidyr has the exact solution needed for this situation and others:

library(tidyr)
df %>% spread(Year, Performance_rating)

Resulting output is the wide data frame:

   EID 2012 2013 2014 2015
1 E100   NA    4    1   NA
2 E200    4    5    2    4
Gopala
  • 10,363
  • 7
  • 45
  • 77
2

We can use dcast

library(reshape2)
dcast(df1, EID~ paste0("Performance_rating_", Year), value.var="Performance_rating")
#  EID Performance_rating_2012 Performance_rating_2013 Performance_rating_2014 Performance_rating_2015
#1 E100                      NA                       4                       1                      NA
#2 E200                       4                       5                       2                       4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Can we perform multiple columns on value.var ,i tried c("Performance_rating,"BGU") but getting error "Error in .subset2(x, i, exact = exact) : subscript out of bounds In addition: Warning message: In if (!(value.var %in% names(data))) { : the condition has length > 1 and only the first element will be used" – Sandeep Shetty Apr 17 '16 at 05:14
  • 1
    @Sandeep You can use data.table, as explained in Arun's answer here: http://stackoverflow.com/a/30517531 – Frank Apr 17 '16 at 05:16