1

I've begun to gradually shift to R from Excel but I'm still having some difficulties with (relatively simple) calculations.

I want to create a frequency version of my variable x, let's call it "xfrequency".

Please, see the sample of my data below.

The desired variable xfrequency should basically count the number of x's, during a certain period (country-year). In the sample data the observation period is from 1990 to 1995. So, in 1994 Canada recieved 4 x's in total.

Perhaps there is a relevant function for this out there? Thanks!

country year    x   xfrequency
CAN     1990    1   1
CAN     1991    0   0
CAN     1992    1   2
CAN     1993    0   0
CAN     1994    2   4
CAN     1995    1   5
USA     1990    0   0
USA     1991    2   2
USA     1992    1   3
USA     1993    0   0
USA     1994    1   4
USA     1995    0   0
GER     1990    NA  NA
GER     1991    1   1
GER     1992    0   0
GER     1993    1   2
GER     1994    2   4
GER     1995    1   5
FKG
  • 285
  • 1
  • 4
  • 17

3 Answers3

2

Example with data.table assuming your dataset in in a variable called data:

library(data.table)
setDT(data)
data[is.na(x),x := 0] # Remove the NA as a sum of anything with NA is NA
data[, xfreq := cumsum(x), by=country]

Which gives:

    country year x xfrequency xfreq
 1:     CAN 1990 1          1     1
 2:     CAN 1991 0          0     1
 3:     CAN 1992 1          2     2
 4:     CAN 1993 0          0     2
 5:     CAN 1994 2          4     4
 6:     CAN 1995 1          5     5
 7:     USA 1990 0          0     0
 8:     USA 1991 2          2     2
 9:     USA 1992 1          3     3
10:     USA 1993 0          0     3
11:     USA 1994 1          4     4
12:     USA 1995 0          0     4
13:     GER 1990 0         NA     0
14:     GER 1991 1          1     1
15:     GER 1992 0          0     1
16:     GER 1993 1          2     2
17:     GER 1994 2          4     4
18:     GER 1995 1          5     5

this is not exactly your expected output, but according to the description you give, the xfreq column seems to be what you're looking for.

To get your exact output, we can reset the xfreq to 0 when x is 0:

> data[x==0,xfreq := 0]
> data
    country year x xfrequency xfreq
 1:     CAN 1990 1          1     1
 2:     CAN 1991 0          0     0
 3:     CAN 1992 1          2     2
 4:     CAN 1993 0          0     0
 5:     CAN 1994 2          4     4

Or in one pass with a test:

data[, xfreq := ifelse(x==0,0L,cumsum(x)), by=country]
Tensibai
  • 15,557
  • 1
  • 37
  • 57
2

A base R alternative:

mydf <- transform(mydf, xfreq = ave(x, country, FUN = function(x) cumsum(!is.na(x))))
mydf[mydf$x==0 | is.na(mydf$x), "xfreq"] <- 0

gives:

> mydf
   country year  x xfrequency xfreq
1      CAN 1990  1          1     1
2      CAN 1991  0          0     0
3      CAN 1992  1          2     3
4      CAN 1993  0          0     0
5      CAN 1994  2          4     5
6      CAN 1995  1          5     6
7      USA 1990  0          0     0
8      USA 1991  2          2     2
9      USA 1992  1          3     3
10     USA 1993  0          0     0
11     USA 1994  1          4     5
12     USA 1995  0          0     0
13     GER 1990 NA         NA     0
14     GER 1991  1          1     1
15     GER 1992  0          0     0
16     GER 1993  1          2     3
17     GER 1994  2          4     4
18     GER 1995  1          5     5
Jaap
  • 81,064
  • 34
  • 182
  • 193
1

You can use library(dplyr).

library(dplyr)
sum_data <- data %>% group_by(country) %>% summarise(xfrequency = sum(x, na.rm=T)).

I just grouped your data by country and added sum of x for all the periods given for this country.

DenisK
  • 140
  • 1
  • 8
  • 1
    This does no give the xfrequency column OP's is looking for. – Tensibai Feb 17 '16 at 08:41
  • Thanks DenisK. But this generates same results as in plyr: `newdata<- ddply(mydata, ~country, summarise, xfrequency=sum(x, na.rm=TRUE))` and it gives the total sum of x's for all years combined. So not what i wanted, but perhaps useful to know in future – FKG Feb 17 '16 at 08:55