I have a wide zoo
with two variables per individuals (maybe this is a double-wide?), that I would like to convert to a long panel with two columns (one for each variables) so that I can run panel regressions (say with the plm
package).
This is trivial with melt
and cast
from the reshape
package, but with the size of these data I run into memory and speed issues. So I'd like to use all base R, but I am having a hard time with the base reshape
function. From the (double-)wide zoo
object I can get the "melted" long data frame with stack
, but I'm afraid that this will fail silently and would prefer the precision of reshape
, but I can't get the syntax correct.
Is there a reshape
equivalent of stack
? The only syntax that I can get to run still leaves me with a (single-)wide data frame with individuals in columns and variables rows?
Below is the relevant code block in case I'm totally off base, but the highlighted block is where I'm stuck and would like to replace stack
with reshape
.
# similar data
library(zoo)
dates <- seq(as.Date("2004-01-01"), by=7, len=52*10)
tickers <- letters
my.df <- data.frame(dates=as.Date(rep_len(dates, length(dates)*length(tickers))),
tickers=rep(tickers, each=length(dates)),
SVI=runif(length(dates)*length(tickers), min=0, max=100))
svi <- read.zoo(my.df, index.column="dates", split="tickers")
meansvi <- rollmean(svi, 52, align="right")
my.zoo <- merge(svi, meansvi)
############################################################
# I need help here -- what is the base::reshape equivalent?
############################################################
# reshape to long data frame
my.df2 <- as.data.frame(my.zoo)
my.df2$dates <- index(my.zoo)
my.df3 <- stack(my.df2, select=-dates)
my.df3$dates <- as.Date(rep_len(index(my.zoo), nrow(my.df3)))
# this is close, but still "single-wide"
my.bad <- reshape(my.df2,
varying=setdiff(names(my.df2), "dates"),
direction="long")
############################################################
# end help region
############################################################
# split "ind" to recover ticker and variable
my.df3$ind <- sub("\\.", " ", my.df3$ind)
inds.list <- strsplit(my.df3$ind, " ")
inds.df <- do.call(rbind.data.frame, inds.list)
names(inds.df) <- c("tickers", "variables")
my.df3 <- cbind(my.df3, inds.df)
my.df3$ind <- NULL
# reshape long data frame with variables in columns
my.df4 <- reshape(my.df3, timevar="variables", idvar=c("dates", "tickers"), direction="wide")
names(my.df4) <- sub("values.", "", names(my.df4))
Edit -- more details.
The output of my stack
call has the correct form, but I would like to produce it with base reshape
. Here's the head
of my.df3
, which is the "melted" long version of my data (i.e., date-individual-variables pairs in rows).
> head(my.df3)
values dates tickers variables
1 16.32171 2004-01-01 a svi
2 30.77852 2004-01-08 a svi
3 14.75164 2004-01-15 a svi
4 49.77205 2004-01-22 a svi
5 59.05023 2004-01-29 a svi
6 55.86036 2004-02-05 a svi
The reshape
call that I show above won't produce the same output as stack
and leaves my data as a wide data frame, but with a row for each date-variable combination. This is my.bad
in the code above.
> my.bad[1:5, 1:5]
dates time a b c
1.svi 2004-01-01 svi 16.32171 91.314077 93.19167
2.svi 2004-01-08 svi 30.77852 17.907831 61.73727
3.svi 2004-01-15 svi 14.75164 13.730005 53.04939
4.svi 2004-01-22 svi 49.77205 5.987645 79.19428
5.svi 2004-01-29 svi 59.05023 38.226002 88.26567
I can achieve the overall desired output with the code above, it's just the intermediate step that I would like to replace with reshape
instead of stack
. Here is the ultimate desired output, which is long with one column per variables with date-individual pairs in the rows.
> head(my.df4)
dates tickers svi meansvi
1 2004-01-01 a 16.32171 NA
2 2004-01-08 a 30.77852 NA
3 2004-01-15 a 14.75164 NA
4 2004-01-22 a 49.77205 NA
5 2004-01-29 a 59.05023 NA
6 2004-02-05 a 55.86036 NA
Maybe there's a way to completely skip my concern about stack
failing silently?