2

I'm having some problems with the reshape() function. Once I have reshaped, it is changing all my variable names to be "value.var 1" for example.

The code I am using to reshape is:

test<- reshape(cdc_city, idvar= "site", timevar="variable", 
               direction="wide", new.row.names=FALSE)

I am guessing I am using the "new.row.names=FALSE" part wrong. I have attempted substituting FALSE for NULL, but that also didn`t work.

My aim is to transform a data set as follows:

site    variable    value
site 1  var 1       4
site 1  var 2       7
site 1  var 3       2
site 1  var 4       6
site 1  var 5       3
site 2  var 1       89
site 2  var 2       43
site 2  var 3       12
site 2  var 4       54
site 2  var 5       23
site 3  var 1       76
site 3  var 2       62
site 3  var 3       13
site 3  var 4       43
site 3  var 5       23

into a data set like this:

site    var 1    var 2    var 3    var 4    var 5
1       4        7        2        6        3
2       89       43       12       54       23
3       76       62       13       43       23

If anyone knows how to get rid of the "value." appearing at the beginning of my variable names, that would be great! Or if there is a better bit of code I could use I am very open to that also.

Thanks,

Timothy

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Timothy Alston
  • 1,501
  • 5
  • 18
  • 29
  • I edited your question since you are using the base R `reshape` function - not the `reshape` package. – Andrie Aug 13 '12 at 14:41
  • The base R function `reshape` is [notoriosuly difficult to use](http://stackoverflow.com/questions/10055602/wrapping-base-r-reshape-for-ease-of-use). I suggest you use the [reshape2](http://cran.r-project.org/web/packages/reshape2/index.html) package on CRAN instead. – Andrie Aug 13 '12 at 14:44
  • thanks for the edit. Will have a look at reshape2 package. That will be using a combination of "melt" and "cast" commands then? Will try work out how to do that – Timothy Alston Aug 13 '12 at 14:50

3 Answers3

4

For reshaping like this, xtabs is also useful:

xtabs(value ~ site + variable, data = cdc_city)
#         variable
# site     var 1 var 2 var 3 var 4 var 5
#   site 1     4     7     2     6     3
#   site 2    89    43    12    54    23
#   site 3    76    62    13    43    23

Regarding your question, though, I don't actually see this as a "problem". That's exactly how it is supposed to work.

Consider the following scenario. You have another column in your data.frame (we'll call it "other") and you want to reshape that whole dataset from long to wide.

Here's some sample data and output:

set.seed(1)
cdc_city$other = sample(1:20, 15, replace=TRUE)
reshape(cdc_city, direction="wide", 
        idvar="site", timevar="variable")
#      site value.var 1 other.var 1 value.var 2 other.var 2 value.var 3
# 1  site 1           4           6           7           8           2
# 6  site 2          89          18          43          19          12
# 11 site 3          76           5          62           4          13
#    other.var 3 value.var 4 other.var 4 value.var 5 other.var 5
# 1           12           6          19           3           5
# 6           14          54          13          23           2
# 11          14          43           8          23          16

In my opinion, tacking on the value and other to the variable names is essential in this case.

Finally, your new.row.names argument is useless here because your unhappiness is with the column names, not row names.

Update

Since I'm in the mood for sharing alternatives, you may also be interested in exploring aggregate, which also has fairly easy to understand syntax:

aggregate(list(var = cdc_city$value), 
          list(site = cdc_city$site), c)
#     site var.1 var.2 var.3 var.4 var.5
# 1 site 1     4     7     2     6     3
# 2 site 2    89    43    12    54    23
# 3 site 3    76    62    13    43    23

If you use aggregate, you also have some control over the names of the resulting variables.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Yes it`s not really a "problem" in that that is how it is meant to work, but for what I am doing it is not what I am wanting it to do. I know in stata there is code you can use to edit prefixes to variable names which would allow me to just delete all of the "value." at the beginning of the variable names. It is more a matter of aesthetics and to make it easier later on when building models so that I would not have to enter the "value." part each time I refer to a variable. I will have a look at the xtabs now though.. – Timothy Alston Aug 14 '12 at 08:06
  • @TimothyAlston, another option (if you are working with already transformed tables and just need to fix the variable names) is to use something like `names(cdc_city) = gsub("value.", "", names(cdc_city))`. – A5C1D2H2I1M1N2O1R2T1 Aug 14 '12 at 08:59
2

I suggest you use the reshape2 package instead - it's much easier to use melt and dcast than the built-in reshape():

library(reshape2)
dcast(cdc_city, site~variable, mean)
    site var 1 var 2 var 3 var 4 var 5
1 site 1     4     7     2     6     3
2 site 2    89    43    12    54    23
3 site 3    76    62    13    43    23
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • thanks, that has worked well. It has however changed the order of my variables. My actual data set has about 70, which now seem to reordered in a random manner. Is there a way to stop this, or is this something that shouldn`t be expected? – Timothy Alston Aug 13 '12 at 15:02
  • @TimothyAlston Please provide a reproducible example that exhibits this behaviour (either in this question, or a new question). The answer will probably involve defining the factor levels of your variable in your desired order. – Andrie Aug 13 '12 at 15:05
  • It seems to have ranked my variables alphabetically. Which is fine, I can still work with it easily, and is less hassle than finding a way around. Thanks for your help! – Timothy Alston Aug 13 '12 at 15:33
  • @Andrie, why `mean` in your example? Also, I don't think that base R `reshape` is that difficult to use. The main problem I see (which is also its strength) is that it is a single function for reshaping in both directions, and the arguments you use differ according to the direction you're going. For that reason, I appreciate [Ari](http://stackoverflow.com/users/636656/ari-b-friedman)'s attempt at creating a more "elegant" ;-) wrapper for it. – A5C1D2H2I1M1N2O1R2T1 Aug 13 '12 at 16:12
  • @mrdwab Habit, I guess - in this case one probably doesn't need an aggregation function. As for the ease of use of `reshape()` - I have **really** tried to use it. It's very fast, very powerful and very flexible. But it's completely impenetrable to understand which arguments to use in which cases. The fact that `reshape` was designed with panel data in mind doesn't make it any easier, except when you have panel data, of course :-) – Andrie Aug 13 '12 at 16:29
1

Yes, this can be done with built-in base::reshape().

For direction=="wide", you can rename reshaped variables using a list with the varying argument. As you have discovered, reshape will generate a column name that looks like `v.name`.`times`. R documentation erroneously suggests that varying accepts a vector of names (it does for direction=="long").

cdc_city <- data.frame(
    site=paste("site", rep(1:3, each=5)),
    variable=paste("var", rep(1:5, 3)), 
    value=c(4,7,2,6,3,89,43,12,54,23,76,62,13,43,23))

#       site variable value
#  1  site 1    var 1     4
#  2  site 1    var 2     7
#  3  site 1    var 3     2
#  4  site 1    var 4     6
#  5  site 1    var 5     3
#  6  site 2    var 1    89
#  7  site 2    var 2    43
#  8  site 2    var 3    12
#  9  site 2    var 4    54
#  10 site 2    var 5    23
#  11 site 3    var 1    76
#  12 site 3    var 2    62
#  13 site 3    var 3    13
#  14 site 3    var 4    43
#  15 site 3    var 5    23

test <- reshape(cdc_city,
    varying=list(c("var 1", "var 2", "var 3", "var 4", "var 5")),
    idvar= "site", timevar="variable", direction="wide")

#       site var 1 var 2 var 3 var 4 var 5
#  1  site 1     4     7     2     6     3
#  6  site 2    89    43    12    54    23
#  11 site 3    76    62    13    43    23

Be careful about data.frames where strings are factors. You can specify varying from a factor in the data.frame directly using

reshape( ..., varying=list(as.character(unique(cdc_city$variable))), ...)
Dave
  • 2,396
  • 2
  • 22
  • 25