1

I have a set of 10 years of contribution data in R. The dollar values are grouped by ID# (of the person giving the gift) and year given. There is not a gift for every person in every year. For each row, I want to indicate if the row (gift) is the first contribution (never before given), if it is the same as the prior year, greater than the prior year, less than the prior year, if there was no gift in the prior year (but there was a gift in some previous year). In addition, I want to indicate if the person giving this gift did not give a gift in the NEXT year.

So, if the data looks like this:

ID#          YEAR          GIFT
1               2005          $10
1               2006          $5
1               2008          $15
1               2009          $20
1               2010          $20


the result should be:

ID#          YEAR          GIFT          STATUS
1               2005          $10          FIRST
1               2006          $5           LOWER         also    NO NEXT YEAR
1               2008          $15          PREVIOUS GIVER
1               2009          $20          HIGHER
1               2010          $20          SAME

Thanks!

Dave1956
  • 13
  • 3
  • It would be nice if your example were reproducible, in the sense described here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example As it is, it's not clear if your columns are factors or characters/strings. – Frank May 10 '15 at 02:10
  • 1
    Sorry about that, I am new here (first question). I am willing to have the years be character, numeric or factors, whichever works best. The gifts are numeric, and the ID#'s are preferable factors/characters, but can be numeric if required. – Dave1956 May 10 '15 at 02:16
  • look at the `shift` function in the development version of `data.table` (there are a bunch of recent questions on SO that should give you examples). that will answer most of your questions. `dt[,first:=.I==1,by=id]` will give you an indicator for first gift (make sure it is sorted by year first) – MichaelChirico May 10 '15 at 02:50

1 Answers1

1

A solution using dplyr and a function to determine the result and also make the code cleaner. The data:

data <- read.table(text="ID          YEAR          GIFT
1               2005          $10
1               2006          $5
1               2008          $15
1               2009          $20
1               2010          $20", header=TRUE)

In order to get the output you want, we must compare each value (this) to it's previous (prev), next (follow) and also check if it's the first or last of the group.

getStatus <- function(first, prev, this, follow, last) {  
  if (first) {
    status <- "FIRST" #Easy one
  } else if (length(prev) < 1 || is.na(prev)) { #Not the first, but prev missing
    status <- "PREVIOUS GIVER"
  } else if (this < prev) { #The next 3 are obvious
    status <- "LOWER"
  } else if (this == prev) {
    status <- "SAME"
  } else if(this > prev) {
    status <- "HIGHER"
  }
  if ((length(follow) < 1 || is.na(follow)) & !last) { #No next but isn't last
    status <- paste(status, "also NO NEXT YEAR")
  }  
  return(status)
}

Now that we have our function, we must work on the data. We'll use dplyr to make things more readable.

library(dplyr)

result <- data %>% group_by(ID) %>% 
  arrange(YEAR) %>% #We make sure YEAR is sorted ascending
  mutate(gift.num = GIFT %>% gsub("\\$", "", .) %>% as.numeric) %>% #Create a column with the gifts as numbers
  mutate(RESULT = sapply(YEAR, function(y) { 
  #Apply getStatus passing the corresponding arguments to create RESULT
    getStatus(.$YEAR %>% first == y, .$gift.num[which(.$YEAR==y-1)],
              .$gift.num[which(.$YEAR==y)], .$gift.num[which(.$YEAR==y+1)],
              .$YEAR %>% last == y)
  })) %>%
  select(-gift.num) #Removing the dummy column

This gives us:

  ID YEAR GIFT                  RESULT
1  1 2005  $10                   FIRST
2  1 2006   $5 LOWER also NO NEXT YEAR
3  1 2008  $15          PREVIOUS GIVER
4  1 2009  $20                  HIGHER
5  1 2010  $20                    SAME

More data would be better to make sure all scenarios are covered, but even if not, with that you should be able to fix any bug.

Molx
  • 6,816
  • 2
  • 31
  • 47
  • Thanks! This definitely got me going in the right direction. As you said, it needed some debugging with more data, but I really appreciate the help! – Dave1956 May 10 '15 at 22:04