0

I would like to plot some data in set with the frequency of x over time y which is in years. I've been able to manipulate the data into a data frame where I have the frequency of certain binary string data. As it currently is have I have the frequency by year with two lines per year in order to plot the frequency of the different binary outcomes. However, I would like to plot the percentage of the total of these observations by year.

df <- data.frame( x = c("1980", "1980", "1981", "1981", "1982", "1982" ),
             y = c("yes", "no", "yes", "no", "yes", "no"),
             z = c("26", "18", "32", "12", "18", "16"))

Initially I tried this code by aggregating the observations by year but it only has 32 rows of data when I need to have 64.

df1$Sum <- aggregate(df1$z, by=list(df1$x), FUN=sum)

Is there someway I can duplicate the observations by year so that in a new column is contains the sums of both "yes" and "no" in 1980 for both rows 1 and 2?

Darin Self
  • 77
  • 1
  • 10
  • 1
    You can start with creating reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – zero323 Sep 13 '13 at 19:45
  • Creating a new column in a data frame with aggregate makes no sense. Aggregate is supposed to reduce the dimensionality of a data frame - and therefore would not fit in the original data frame. – dayne Sep 13 '13 at 19:52

2 Answers2

1
library(data.table)
dt = data.table(your_df)

dt[, z.sum := sum(z), by = x]

Assuming your column z is actually numbers, not really the case in OP, but I assume that's a typo.

eddi
  • 49,088
  • 6
  • 104
  • 155
0

If your goal is to "plot the percentage of the total of these observations by year", I assume you don't have to go via sums.

Here is one possibility to get percentages per year:

library(plyr)
df <- data.frame( x = c("1980", "1980", "1981", "1981", "1982", "1982" ),
                  y = c("yes", "no", "yes", "no", "yes", "no"),
                  z = c("26", "18", "32", "12", "18", "16"))
df$z <- as.numeric(as.character(df$z))

df2 <- ddply(.data = df, .variables = .(x), mutate,
             prop = z/sum(z))
df2
Henrik
  • 65,555
  • 14
  • 143
  • 159