How to create a variable that would capture only increases?

Question

I'm experiencing some problems describing what I want to create. So let's say that I have a dataset like the one below:

country year    X
A       1990    0
A       1991    1
A       1992    2
A       1993    3
A       1994    3
B       1990    1
B       1991    2
B       1992    3
B       1993    3
C       1990    0
C       1991    1
C       1992    2
C       1993    3
C       1994    4

The variable X counts the number of times a country appears in the media. Note though that it sometimes stays on the same number for several years – this is because no new appearances are reported for that year.

So I want to create a variable that only captures increases. Let's call this variable "Xnew". I give an example of what it would look like below:

country year    X   Xnew
A       1990    0   0
A       1991    1   1
A       1992    2   1
A       1993    3   1
A       1994    3   0
B       1990    1   1
B       1991    2   1
B       1992    3   1
B       1993    3   0
C       1990    0   0
C       1991    1   1
C       1992    2   1
C       1993    3   1
C       1994    4   1

As you see, the "Xnew" variable is a binary one, where 1 captures only increases, and 0 otherwise.

My attempt at creating this variable was the following:

> data$Xnew <- as.numeric(X >1)

But it doesn't really do what I want, though I sense that the solution lies somewhere close to this. Any suggestions? Thanks!

A reproducible sample:

> dput(data)
structure(list(country = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), 
    year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1990L, 1991L, 
    1992L, 1993L, 1990L, 1991L, 1992L, 1993L, 1994L), X = c(0L, 
    1L, 2L, 3L, 3L, 1L, 2L, 3L, 3L, 0L, 1L, 2L, 3L, 4L)), .Names = c("country", 
"year", "X"), class = "data.frame", row.names = c(NA, -14L))

`library(dplyr) ; data %>% mutate(Xnew = as.integer(c(0, diff(X)) > 0))` — alistaire, Jun 28 '16 at 19:32

akrun · Accepted Answer · 2016-06-28T19:39:39.307

3

We can use ave from base R

data$Xnew <- with(data, ave(X, country, FUN = function(x) c(TRUE, diff(x) !=0) & x!=0))
data$Xnew
#[1] 0 1 1 1 0 1 1 1 0 0 1 1 1 1

Or with data.table

library(data.table)
setDT(data)[, Xnew := as.integer((X - shift(X, fill=0)) >0) , by =  country]

edited Jun 28 '16 at 19:39

answered Jun 28 '16 at 19:31

akrun

874,273
37
540
662

Hi akrun. Thanks for this! I have another question posted 3 days ago "How to create a variable (that captures increases at a certain threshold) R?" I would really appreciate if you had the time to look at it. You seem to be the right person to solve it. – FKG Jul 01 '16 at 10:07
http://stackoverflow.com/questions/38062637/how-to-create-a-variable-that-captures-increases-at-a-certain-threshold-r – FKG Jul 01 '16 at 11:17
let me know if you can't find it – FKG Jul 01 '16 at 11:18
I found it, I will check it – akrun Jul 01 '16 at 11:18

dww · Answer 2 · 2016-06-28T21:39:49.460

3

You can use diff to test if X changes

data$Xnew <- 0L
data$Xnew[which(diff(data$X) > 0) +1L] <- 1L

edited Jun 28 '16 at 21:39

answered Jun 28 '16 at 20:36

dww

30,425
5
68
111

score 0 · Answer 3 · answered Jun 28 '16 at 19:58

Here is another option using zoo library (but more complicated than the one above)

library(zoo); library(dplyr)

tmp=tbl_df(data.frame())
for(s in unique(data$country)) {
  #s="A"
  t=filter(data, country==s)
  t=t[order(as.Date(t$year)),]
  if(nrow(t)==1){
    t$Xnew[1]=0
  } else {
    t$previous = lag(zoo(t$X), 1, na.pad=TRUE)
    t$previous[is.na(t$previous)]<- 0
    t$Xnew=t$X-t$previous
  }
  tmp=rbind(tmp, t)
}
tmp

How to create a variable that would capture only increases?

3 Answers3