I have a reasonably large R data frame. I wish to use mapply to take inputs from several of the columns, and pass them on a row by row basis to a function, the return from this will be a 5 digit numeric vector, which I wish to assign, again on a row by row basis, to ten columns in the original data frame.
So far so good, and it is working fine.
But additionally I want to be able to do this on a subset of rows.
Thus to call the 'my.function
' function for all rows in my.df
, using columns my.df$a
, my.df$b
and my.df$c
as inputs, and output to for example my.df
columns 11 to 15 , syntax would be;
my.df[,11:15]<-mapply(my.function, my.df$a, my.df$b, my.df$c)
however if I want to run that over a large but not complete subset of the data frame, the syntax seems to be a bit messier.
If I say that my.subset is the vector containing the rows to subset, syntax would look thus;
my.df[my.subset,11-15]<-mapply(my.function, my.df$a[my.subset], my.df$b[my.subset], my.df$c[my.subset])
Seems a little long winded to keep referring to [my.subset
]
Is there a slightly more succinct yet readable way to do this?
addendum: ideally the solution will not preclude me from making this call parallel in future as I have some pretty large data frames to process and want to start using the machine more efficiently.