0

I have a dataframe with multiple numeric and character columns. For example,

> df <- data.frame(Name=c('John','Tom','Sarah'), Quantity=c(3,4,5), Price=c(5,6,7))
> df
   Name Quantity Price
1  John        3     5
2   Tom        4     6
3 Sarah        5     7

I would like to write a function that checks whether name is John or Tom and calculates, say, Sales=Quantity*Price. This function would look like the following:

myFunc <- function(x) {
  
 if (Name %in% c('John','Tom') {   
  Sales <-    Quantity * Price
}
}

I would like to send each row of my dataframe to the function to get the following output:

   Name Quantity Price Sales
1  John        3     5  15
2   Tom        4     6  24
3 Sarah        5     7  NA

I tried following the suggestions in the link below without any success: Call apply-like function on each row of dataframe with multiple arguments from each row

How can I achieve this in R? Thanks for any help.

LaEdri
  • 33
  • 2

1 Answers1

0

In R, and especially in your case, you can make use of vectorised functions. They work on the complete vector, so you don't have to apply the function separately for every row, but can directly supply the complete columns:

df <- data.frame(Name=c('John','Tom','Sarah'), Quantity=c(3,4,5), Price=c(5,6,7))

my_vectorised_fun <- function(name, quantity, price) {
  sales <- quantity * price
  
  # check for which the name doesn't fit
  index_names <- !name %in% c("John", "Tom")
  sales[index_names] <- NA
  
  sales
}

library(dplyr)
df %>% 
  mutate(Sales = my_vectorised_fun(Name, Quantity, Price))
#>    Name Quantity Price Sales
#> 1  John        3     5    15
#> 2   Tom        4     6    24
#> 3 Sarah        5     7    NA

Created on 2021-02-19 by the reprex package (v0.3.0)


Edit

Here is a version where you pass the complete .data pronoun to the function and only have to specify the names in the function:

df <- data.frame(Name=c('John','Tom','Sarah'), Quantity=c(3,4,5), Price=c(5,6,7))

my_vectorised_fun <- function(all_data) {
  sales <- all_data[["Quantity"]] * all_data[["Price"]]
  
  # check for which the name doesn't fit
  index_names <- !all_data[["Name"]] %in% c("John", "Tom")
  sales[index_names] <- NA
  
  sales
}

library(dplyr)
df %>% 
  mutate(Sales = my_vectorised_fun(.data))
#>    Name Quantity Price Sales
#> 1  John        3     5    15
#> 2   Tom        4     6    24
#> 3 Sarah        5     7    NA

Created on 2021-02-19 by the reprex package (v0.3.0)

starja
  • 9,887
  • 1
  • 13
  • 28
  • Many thanks @starja. Is it possible to remove the need to specify the column names in function(name, quantity, price)? My actual data contains many different columns that I need to use in the function. So I was wondering if it was possible to pass all column names without the need to specify them, something like function(...)? – LaEdri Feb 19 '21 at 13:58
  • If you use the `.data` pronoun, you can change it so that you only have to specify the names in the function – starja Feb 19 '21 at 16:44