0

I am attempting to apply a custom function that calls components of that dataframe to do a calculation. I have made a trivial example below because my actual problem is very hard to make a reproducible example. In the below example I want to have the first two columns be added together to create a third column which is the sum of them. Below is an example I found online that gets close to what I want:

celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45))
f=function(x,output){
  name=x[1]
  income=x[3]
  cat(name,income,"\n")
}
apply(celebrities,1,f)

But when I try to take it and apply mathematical function it doesn't work:

  f2=function(x,output){
  age=x[2]
  income=x[3]
  sum(age,income)
}
apply(celebrities,1,f2)

In essence what I need is for apply to take a dataset, go through every row of that dataset using the values in that row as inputs into the function and add a third column to the dataset with the results of the function. Please let me know how I can clarify this question if needed. I have referred to the questions below, but they don't seem to work for me.

Apply a function to every row of a matrix or a data frame

How to assign new values from lapply to new column in dataframes in list

Call apply-like function on each row of dataframe with multiple arguments from each row

user2355903
  • 593
  • 2
  • 8
  • 29
  • 2
    When you use `apply` on a `data.frame`, it is converted to a `matrix` for the processing. If any of the columns (of the processed frame) are `character`, the all columns are converted to `character`, defeating any math operations. Though I tend to discourage `apply` with frames, if you must then make sure that you only use a portion of it, something like `apply(celebrities[c("age","income")], 1, sum)`. – r2evans Jul 12 '18 at 04:56
  • You could try using something from `library(plyr)` such as `adply` or `aaply` (depending on what you want the output format to be like) which don't coerce all columns to `character` – Sarah Jul 12 '18 at 05:15
  • I believe `dplyr` now has a `rowwise` function that can help you do what you're looking for. E.g., `library(dplyr) ; celebrities %>% rowwise %>% mutate(new_var = f(var1, var2))` – Jake Fisher Jul 12 '18 at 16:15

3 Answers3

2

For the particular task requested it could be

celebrities$newcol <- with(celebrities, age + income)

The + function is inherently vectorized. Using apply with sum is inefficient. Using apply could have been greatly simplified by omitting the first column because that would avoid the coercion to a character matrix caused by the first column.

 celebrities$newcol <- apply(celebrities[-1], function(x) sum(x) )

That way you would avoid coercing the vectors to "character" and then needing to coerce back the formerly-numeric columns to numeric. Using sum inside apply does get around the fact that sum is not vectorized, but it's an example of inefficient R coding.

You get automatic vectorization if the "inner" algorithm can be constructed completely from vectorized functions: the Math and Ops groups being the usual components. See ?Ops. Otherwise, you may need to use mapply or Vectorize.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
1

Taking hints from @r2evans and @user2738526 I have made the modification to your function. Explicitly convert numbers to numeric. The below code snippet works for your case:

f2=function(x,output){
  age=as.numeric(x[2])
  income=as.numeric(x[3])
  sum(age,income)
}
apply(celebrities,1,f2)

[1] 53.2 33.5 60.0 50.9 82.0 34.5 74.0
SatZ
  • 430
  • 5
  • 14
  • I'm not downvoting because did solve the problem after identifying the errors in the original, but you should understand that this is rather ugly R coding and would not be a good example for others to emulate. `r2evans` gave better advice and you seemed to ignore it. – IRTFM Jul 12 '18 at 16:09
  • @42- I dont like my solution either, it is rather inefficient, cumbersome and what not but I thought if the OP had any other numerical function than `sum`. r2evans gave a better solution I agree. The original question doesn't need `apply` at all – SatZ Jul 12 '18 at 16:50
  • I'll delete it if it is not worth keeping it up here – SatZ Jul 12 '18 at 16:50
  • It could be useful as a basis for explaining _why_ you should be searching for other strategies before using `apply`. I'd upvote it if it were improved in that direction. – IRTFM Jul 12 '18 at 18:11
  • @42- sure I'll do that. I'd need to read up a bit more to improve it in the direction suggested – SatZ Jul 13 '18 at 06:44
1

Give this a try:

library(dplyr)
celebrities=data.frame(name=c("Andrew","matt","Dany","Philip","John","bing","Monica"),
                       age=c(28,23,49,29,38,23,29),
                       income=c(25.2,10.5,11,21.9,44,11.5,45)) 

celebrities %>% 
  rowwise %>% 
  mutate(age_plus_income = sum(age, income))

(Obviously, for summing two columns, you'd be better off using mutate(celebrities, age_plus_income = age + income), but I assume your real example uses a more complicated function.)

Jake Fisher
  • 3,220
  • 3
  • 26
  • 39