4

First, here is the data tbl_df (simplified) I am using :

> mytbldf
Source: local data frame [6 x 5]

  iso2c country year     var1      var2
1    BI Burundi 2011 4.486265  6.693711
2    BI Burundi 2012 3.939242  5.330326
3    BI Burundi 2013 4.286439  5.747370
4    UG  Uganda 2011 3.998849 10.025680
5    UG  Uganda 2012 4.606198 13.416311
6    UG  Uganda 2013 4.746322 15.981362

I want to spread (in tidyr wording) the year variable over var1 and var2. After some (...) iterations, I found a syntax that works:

> recast(mytbldf, iso2c + country ~ variable + year, measure.var = c("var1","var2"))
  iso2c country var1_2011 var1_2012 var1_2013 var2_2011 var2_2012 var2_2013
1    BI Burundi  4.486265  3.939242  4.286439  6.693711  5.330326   5.74737
2    UG  Uganda  3.998849  4.606198  4.746322 10.025680 13.416311  15.98136

Three questions:

1) if I don't specify measure.var =, I get the following error:

> recast(mytbldf, iso2c + country ~ variable + year)
Using iso2c, country as id variables
Error in eval(expr, envir, enclos) : object 'year' not found

Why is that? From recast's man, I assumed it would take measure.var as all other variables?

2) So, is there a way to avoid specifying measure.var? In my real case, there are too many variables with too long names for explicitly specifying them.

3) Is there a better/simpler way to do that using reshape2 or tidyr that I am missing?

Alexandre Halm
  • 979
  • 1
  • 8
  • 18
  • 1
    Re: your third question, I think no in `tidyr`. Doing this in `tidyr` would require individual calls of `spread` for each variable you want to spread (two in your example but it sounds like more in your actual data) and then combining the results. According to [Hadley's tidy data paper](http://vita.had.co.nz/papers/tidy-data.pdf), your data are already tidy, which may be why this sort of operation is not supported. – Sam Firke May 04 '15 at 20:12
  • @SamFirke: you're right, the data is already tidy indeed. What I need to do now is the last step, where i have to produce (messy) outputs that can be injected in XLS spreadsheets – Alexandre Halm May 05 '15 at 06:59

1 Answers1

5

You could try the devel version of data.table which can take multiple value.var columns

library(data.table)#v1.9.5+
dcast(setDT(mytbldf), iso2c+country~year, value.var=c('var1', 'var2'))
#    iso2c country 2011_var1 2012_var1 2013_var1 2011_var2 2012_var2 2013_var2
#1:    BI Burundi  4.486265  3.939242  4.286439  6.693711  5.330326   5.74737
#2:    UG  Uganda  3.998849  4.606198  4.746322 10.025680 13.416311  15.98136

Or using reshape from base R

reshape(mytbldf, idvar=c('iso2c', 'country'), timevar='year', 
                 direction='wide')
#  iso2c country var1.2011 var2.2011 var1.2012 var2.2012 var1.2013 var2.2013
#1    BI Burundi  4.486265  6.693711  3.939242  5.330326  4.286439   5.74737
#4    UG  Uganda  3.998849 10.025680  4.606198 13.416311  4.746322  15.98136

Regarding recast it is just melt + dcast. So, if you are not specifying the id.var or measure.var in melt, the long format will be different than what you expected. For recast, you can specify the id.var as

  library(reshape2)
  recast(mytbldf, id.var=c('iso2c', 'country', 'year'),
            iso2c+country~variable+year)
  #  iso2c country var1_2011 var1_2012 var1_2013 var2_2011 var2_2012 var2_2013
  #1    BI Burundi  4.486265  3.939242  4.286439  6.693711  5.330326   5.74737
  #2    UG  Uganda  3.998849  4.606198  4.746322 10.025680 13.416311  15.98136

Also, if you know the column index, it would be easier than typing the names,

 recast(mytbldf, measure.var=4:5,  iso2c+country~variable+year)
 #   iso2c country var1_2011 var1_2012 var1_2013 var2_2011 var2_2012 var2_2013
 #1    BI Burundi  4.486265  3.939242  4.286439  6.693711  5.330326   5.74737
 #2    UG  Uganda  3.998849  4.606198  4.746322 10.025680 13.416311  15.98136
akrun
  • 874,273
  • 37
  • 540
  • 662