Creating a new variable that aggregates two years of observations

Question

I would like to plot some data in set with the frequency of x over time y which is in years. I've been able to manipulate the data into a data frame where I have the frequency of certain binary string data. As it currently is have I have the frequency by year with two lines per year in order to plot the frequency of the different binary outcomes. However, I would like to plot the percentage of the total of these observations by year.

df <- data.frame( x = c("1980", "1980", "1981", "1981", "1982", "1982" ),
             y = c("yes", "no", "yes", "no", "yes", "no"),
             z = c("26", "18", "32", "12", "18", "16"))

Initially I tried this code by aggregating the observations by year but it only has 32 rows of data when I need to have 64.

df1$Sum <- aggregate(df1$z, by=list(df1$x), FUN=sum)

Is there someway I can duplicate the observations by year so that in a new column is contains the sums of both "yes" and "no" in 1980 for both rows 1 and 2?

You can start with creating reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — zero323, Sep 13 '13 at 19:45
Creating a new column in a data frame with aggregate makes no sense. Aggregate is supposed to reduce the dimensionality of a data frame - and therefore would not fit in the original data frame. — dayne, Sep 13 '13 at 19:52

score 1 · Answer 1 · answered Sep 13 '13 at 20:09

1

library(data.table)
dt = data.table(your_df)

dt[, z.sum := sum(z), by = x]

Assuming your column z is actually numbers, not really the case in OP, but I assume that's a typo.

answered Sep 13 '13 at 20:09

eddi

49,088
6
104
155

Henrik · Accepted Answer · 2013-09-13T20:36:25.887

0

If your goal is to "plot the percentage of the total of these observations by year", I assume you don't have to go via sums.

Here is one possibility to get percentages per year:

library(plyr)
df <- data.frame( x = c("1980", "1980", "1981", "1981", "1982", "1982" ),
                  y = c("yes", "no", "yes", "no", "yes", "no"),
                  z = c("26", "18", "32", "12", "18", "16"))
df$z <- as.numeric(as.character(df$z))

df2 <- ddply(.data = df, .variables = .(x), mutate,
             prop = z/sum(z))
df2

edited Sep 13 '13 at 20:36

answered Sep 13 '13 at 20:24

Henrik

65,555
14
143
159

watch out for factors and `as.numeric` – eddi Sep 13 '13 at 20:34

Creating a new variable that aggregates two years of observations

2 Answers2