How to efficiently divide successor by predecessor in each column of a dataframe

Question

I have a dataframe myDF created like this:

a <- 1:4
b <- seq(3, 16, length=4)
myDF <- data.frame(a=a, b=b)

which looks like this:

  a         b
1 1  3.000000
2 2  7.333333
3 3 11.666667
4 4 16.000000

Now I want to divide subsequently predecessor and successor in each column, add the results to the existing dataframe, replace the one missing value in each column by NA and add new column names. For the example above, my desired outcome looks like this:

  a         b     amod     bmod
1 1  3.000000       NA       NA
2 2  7.333333 2.000000 2.444444
3 3 11.666667 1.500000 1.590909
4 4 16.000000 1.333333 1.371429

So, in column a 2 is divided by 1, 3 is divided by 2, and 4 is divided by 3 and the results are stored in amod.

The way I do it now is like this:

divStuff <-function(aCol){

  newCol <- aCol[2:length(aCol)]/aCol[1:length(aCol) - 1]
  newCol <- c(NA, newCol)

  return(newCol)
}
newDF <- data.frame(lapply(myDF, divStuff))
names(newDF) <- paste(names(myDF), "mod", sep="")
endDF <- cbind(myDF, newDF)

I wrote a function divStuff which does the division and then call lapply which applies this function to each column of the data frame.

Now I am wondering whether that is the way to do it or whether there is a smarter way on doing such kind of operations which would e.g. avoid the cbind call or does the cbind in a way which avoids the line newCol <- c(NA, newCol) by adding a NA automatically. I did not find a nice way, all solutions for that looks similar to this one.

David Arenburg · Answer 1 · 2015-09-02T20:47:55.950

7

Here's a quick data.table version (using the devel version on GH)

library(data.table) ## V 1.9.5
setDT(myDF)[, paste0(names(myDF), "mod") := lapply(.SD, function(x) x/shift(x))]
#    a         b     amod     bmod
# 1: 1  3.000000       NA       NA
# 2: 2  7.333333 2.000000 2.444444
# 3: 3 11.666667 1.500000 1.590909
# 4: 4 16.000000 1.333333 1.371429

Or similarly with dplyr though you may want to play around with the column names (this is due a bug(?) in mutate_each when it drops the original columns and doesn't rename the resulting ones when given a single function)

library(dplyr)
myDF %>% 
  mutate_each(funs(./lag(.))) %>%
  cbind(myDF, .)
#   a         b        a        b
# 1 1  3.000000       NA       NA
# 2 2  7.333333 2.000000 2.444444
# 3 3 11.666667 1.500000 1.590909
# 4 4 16.000000 1.333333 1.371429

edited Sep 02 '15 at 20:47

answered Sep 02 '15 at 20:07

David Arenburg

91,361
17
137
196

That looks more like I have in mind for such a 'simple' task. I try it once I have updated my R version; with mine I run into dependency issues. But already thanks a lot for the input! – Cleb Sep 02 '15 at 20:14
Works fine as well! The first solution also requires the packages 'magic' which contains the 'shift' function; maybe you can update your post!? Since it is a nice and working solution, I upvote it but will accept Pierre Lafortune's one since it does not require any additional packages. – Cleb Sep 02 '15 at 20:52
If you want to avoud `shift` you can simply do `setDT(myDF)[, paste0(names(myDF), "mod") := lapply(.SD, function(x) x/c(NA, x[-.N]))]` – David Arenburg Sep 02 '15 at 20:56
Ok, thanks! Could learn a lot from your answer! The comment about 'magic' was more for other people that want to reproduce your result; I found it quickly but others might do not find it. – Cleb Sep 02 '15 at 21:03
To the data.table package but not to 'magic' or do I overlook something? – Cleb Sep 02 '15 at 21:10
To [the devel version on GH](https://github.com/Rdatatable/data.table/wiki/Installation) – David Arenburg Sep 02 '15 at 21:11
OK, now I see; I have indeed version 1.9.4 while you have 1.9.5 - my fault, then it should be fine. – Cleb Sep 02 '15 at 21:15

Pierre L · Accepted Answer · 2015-09-02T20:19:54.043

6

With base R:

myDF[,paste0(names(myDF), "mod")] <- sapply(myDF, function(x) c(NA, x[-1]/head(x,-1)))
#  a         b     amod     bmod
#1 1  3.000000       NA       NA
#2 2  7.333333 2.000000 2.444444
#3 3 11.666667 1.500000 1.590909
#4 4 16.000000 1.333333 1.371429

edited Sep 02 '15 at 20:19

answered Sep 02 '15 at 20:16

Pierre L

28,203
6
47
69

Probably `paste0(names(myDF), "mod")` will be more general – David Arenburg Sep 02 '15 at 20:17
1

Excellent, that works fine! I upvote it for now and might accept it later one. – Cleb Sep 02 '15 at 20:18

How to efficiently divide successor by predecessor in each column of a dataframe

2 Answers2