0

I am trying to write a for loop that can reorganize a dataframe in to a table for publication, e.g. in excel.

Here is a small sample of the data data for my problem:

df <- data.frame(ST = c("NY", "NJ", "PA", "NY", "NJ", "PA"),
             YR = c(2010, 2010, 2010, 2011, 2011, 2011),
             X = c(.25, .24, .23, .24, .23, .22)) 

I would like to produce a table that lists each state once in a "State" column, each year as a row name titled "X Year" and each X value under the proper year. In this case, it would look like this, minus the "...":

State | 2010 Pop | 2011 Pop

NY ..... .25 .............. .24

NJ .......24................ .23

PA ..... .23 ............... .22

I have about a dozen years and data for all states, so I can do this laboriously by making a dataframe for each year, renaming the column names, binding the columns, and eliminating repeated ST columns. But, my intuition is that there is a more efficient way to do this. Would appreciate help thinking through this. Thanks!

anothermh
  • 9,815
  • 3
  • 33
  • 52
Aaron
  • 109
  • 5
  • Try with `spread` `spread(df, YR, X)` – akrun Aug 29 '18 at 15:55
  • This is called "reshaping" and in particular "reshaping from long to wide". Base R has a function called `reshape` for this purpose. Other packages like `tidyr` have "friendlier" functions for the same purpose (such as `tidyr::spread`). – DanY Aug 29 '18 at 16:02

3 Answers3

0

You can try reshape function :

df <- data.frame(ST = c("NY", "NJ", "PA", "NY", "NJ", "PA"),
                 YR = c(2010, 2010, 2010, 2011, 2011, 2011),
                 X = c(.25, .24, .23, .24, .23, .22))

reshape(df, timevar = "YR", idvar = "ST", direction = "wide")

  ST X.2010 X.2011
1 NY   0.25   0.24
2 NJ   0.24   0.23
3 PA   0.23   0.22
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
0

This will give you the outcome what you want using tidyverse packages.

library(tidyr)
library(dplyr)

df %>% 
  mutate(YR = paste(YR, "Pop")) %>% # add Pop behind year for column headers
  rename(State = ST) %>% # rename state column for outcome
  spread(YR, X)

  State 2010 Pop 2011 Pop
1    NJ     0.24     0.23
2    NY     0.25     0.24
3    PA     0.23     0.22

P.S. I also voted to close this question as there are already many questions with answers like this on. But for now it might help you solve your issue.

phiver
  • 23,048
  • 14
  • 44
  • 56
0

1) xtabs Using base R we get a short simple solution without needing any for loop or package dependencies:

xtabs(X ~., df)

giving:

    YR
ST   2010 2011
  NJ 0.24 0.23
  NY 0.25 0.24
  PA 0.23 0.22

2) tapply This also uses only base R:

tapply(df[[3]], df[-3], sum)

giving:

    YR
ST   2010 2011
  NJ 0.24 0.23
  NY 0.25 0.24
  PA 0.23 0.22

3) zoo This creates an annual time series for each state:

library(zoo)
z <- read.zoo(df, index = "YR", split = "ST")

giving:

> z
       NJ   NY   PA
2010 0.24 0.25 0.23
2011 0.23 0.24 0.22

A data frame can be produced using fortify.zoo(z) .

Creating a time series may be useful for simply carrying out subsequent operations. For example, this would plot it:

library(ggplot2)
autoplot(z) + scale_x_continuous(breaks = time(z)) # separate panels
autoplot(z, facet = NULL) + scale_x_continuous(breaks = time(z)) # single panel

3a) It makes more sense to use a column for each state if we are going to produce a time series but it would be possible to use the states as "times":

read.zoo(df, index = "ST", split = "YR", FUN = identity)

giving this zoo series which, again, could be converted to data frame using fortify.zoo.

   2010 2011
NJ 0.24 0.23
NY 0.25 0.24
PA 0.23 0.22

4) tidyr @Dan Y already provided a tidyr solution in the comments under the question so we just point that out.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341