0

I have a reasonably large R data frame. I wish to use mapply to take inputs from several of the columns, and pass them on a row by row basis to a function, the return from this will be a 5 digit numeric vector, which I wish to assign, again on a row by row basis, to ten columns in the original data frame.

So far so good, and it is working fine.

But additionally I want to be able to do this on a subset of rows.

Thus to call the 'my.function' function for all rows in my.df, using columns my.df$a, my.df$b and my.df$c as inputs, and output to for example my.df columns 11 to 15 , syntax would be;

my.df[,11:15]<-mapply(my.function, my.df$a, my.df$b, my.df$c)

however if I want to run that over a large but not complete subset of the data frame, the syntax seems to be a bit messier.

If I say that my.subset is the vector containing the rows to subset, syntax would look thus;

my.df[my.subset,11-15]<-mapply(my.function, my.df$a[my.subset], my.df$b[my.subset], my.df$c[my.subset])

Seems a little long winded to keep referring to [my.subset]

Is there a slightly more succinct yet readable way to do this?

addendum: ideally the solution will not preclude me from making this call parallel in future as I have some pretty large data frames to process and want to start using the machine more efficiently.

Sotos
  • 51,121
  • 6
  • 32
  • 66
Pascoe
  • 167
  • 9
  • 1
    Please show a small reproducible example and expected output. Based on your current code, the `mapply` returns a matrix (as `simplify=FALSE`) is not mentioned and even if it is mentioned, it will return a `list` with n number of elements where 'n' is the number of rows, while the lhs is column index for 5 elements and it is not matching with what you wanted. It is not clear what `my.function`. is, otherwise, create a function that takes 'a', 'b' and 'c' as arguments instead of looping through rows. – akrun Dec 22 '16 at 14:54
  • 1
    Possible duplicate of [Call apply-like function on each row of dataframe with multiple arguments from each row](http://stackoverflow.com/questions/15059076/call-apply-like-function-on-each-row-of-dataframe-with-multiple-arguments-from-e) – manotheshark Dec 22 '16 at 15:16
  • It really depends on the function, but it is better to avoid by-row operations in R. It is better you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), provide your funciton and show desired output. Otherwise, this is just a waste of everyone's time and should be closed as "unclear" – David Arenburg Dec 22 '16 at 15:39

1 Answers1

1

Taking a wild guess on what your data, function and output will look like

library(plyr)
library(dplyr)

my.df %>%
  adply(1, function(x) (x$x - x$y)/x$z * 1:5)

to perform this on a subset of data, add slice to the pipe

my.df %>%
  slice(11:15) %>%
  adply(1, function(x) (x$a - x$b)/x$c * 1:5)

There are a lot of examples about this on stackoverflow that could have been found with a search

manotheshark
  • 4,297
  • 17
  • 30
  • Thanks. I had searched a bit before asking the Q (I understand that it's not the done thing here to simply expect to be spoon fed) however I was hoping to achieve this in base R. However I may look at plyr / dplyr as they seem to be in such ubiquitous use. I am not anti packages per se, merely trying to minimise package proliferation in my code. – Pascoe Dec 22 '16 at 15:23
  • It is important to import *plyr* and *dplyr* in the order @manotheshark did. First you have to import the *plyr* package. – Ferdi Dec 22 '16 at 15:30
  • Whats the point of loading all these package in order to mimic a simple `apply` application in order to answer an unclear question? – David Arenburg Dec 22 '16 at 15:40