Creating a reshaped dataframe with a for-loop

Question

I am trying to write a for loop that can reorganize a dataframe in to a table for publication, e.g. in excel.

Here is a small sample of the data data for my problem:

df <- data.frame(ST = c("NY", "NJ", "PA", "NY", "NJ", "PA"),
             YR = c(2010, 2010, 2010, 2011, 2011, 2011),
             X = c(.25, .24, .23, .24, .23, .22))

I would like to produce a table that lists each state once in a "State" column, each year as a row name titled "X Year" and each X value under the proper year. In this case, it would look like this, minus the "...":

State | 2010 Pop | 2011 Pop

NY ..... .25 .............. .24

NJ .......24................ .23

PA ..... .23 ............... .22

I have about a dozen years and data for all states, so I can do this laboriously by making a dataframe for each year, renaming the column names, binding the columns, and eliminating repeated ST columns. But, my intuition is that there is a more efficient way to do this. Would appreciate help thinking through this. Thanks!

This is called "reshaping" and in particular "reshaping from long to wide". Base R has a function called `reshape` for this purpose. Other packages like `tidyr` have "friendlier" functions for the same purpose (such as `tidyr::spread`). — DanY, Aug 29 '18 at 16:02

score 0 · Answer 1 · answered Aug 29 '18 at 16:01

You can try reshape function :

df <- data.frame(ST = c("NY", "NJ", "PA", "NY", "NJ", "PA"),
                 YR = c(2010, 2010, 2010, 2011, 2011, 2011),
                 X = c(.25, .24, .23, .24, .23, .22))

reshape(df, timevar = "YR", idvar = "ST", direction = "wide")

  ST X.2010 X.2011
1 NY   0.25   0.24
2 NJ   0.24   0.23
3 PA   0.23   0.22

score 0 · Answer 2 · answered Aug 29 '18 at 16:02

This will give you the outcome what you want using tidyverse packages.

library(tidyr)
library(dplyr)

df %>% 
  mutate(YR = paste(YR, "Pop")) %>% # add Pop behind year for column headers
  rename(State = ST) %>% # rename state column for outcome
  spread(YR, X)

  State 2010 Pop 2011 Pop
1    NJ     0.24     0.23
2    NY     0.25     0.24
3    PA     0.23     0.22

P.S. I also voted to close this question as there are already many questions with answers like this on. But for now it might help you solve your issue.

Thank you, this was a straightforward approach. – Aaron Aug 29 '18 at 17:06 — Aaron, Aug 29 '18 at 17:06

G. Grothendieck · Accepted Answer · 2018-09-01T14:40:30.387

1) xtabs Using base R we get a short simple solution without needing any for loop or package dependencies:

xtabs(X ~., df)

giving:

    YR
ST   2010 2011
  NJ 0.24 0.23
  NY 0.25 0.24
  PA 0.23 0.22

2) tapply This also uses only base R:

tapply(df[[3]], df[-3], sum)

giving:

    YR
ST   2010 2011
  NJ 0.24 0.23
  NY 0.25 0.24
  PA 0.23 0.22

3) zoo This creates an annual time series for each state:

library(zoo)
z <- read.zoo(df, index = "YR", split = "ST")

giving:

> z
       NJ   NY   PA
2010 0.24 0.25 0.23
2011 0.23 0.24 0.22

A data frame can be produced using fortify.zoo(z) .

Creating a time series may be useful for simply carrying out subsequent operations. For example, this would plot it:

library(ggplot2)
autoplot(z) + scale_x_continuous(breaks = time(z)) # separate panels
autoplot(z, facet = NULL) + scale_x_continuous(breaks = time(z)) # single panel

3a) It makes more sense to use a column for each state if we are going to produce a time series but it would be possible to use the states as "times":

read.zoo(df, index = "ST", split = "YR", FUN = identity)

giving this zoo series which, again, could be converted to data frame using fortify.zoo.

   2010 2011
NJ 0.24 0.23
NY 0.25 0.24
PA 0.23 0.22

4) tidyr @Dan Y already provided a tidyr solution in the comments under the question so we just point that out.

Creating a reshaped dataframe with a for-loop

3 Answers3