0

I am trying to create a function that can be applied to a single column in a data frame, for each row.

What I want the function to do is to take a total sales value input into a data frame, search for a range it falls in (in a database of 1000 values- it has a column of minsales, maxsales, and commission), and then return the commission.

I have created a working function to do this, however, when I try to apply it to an entire column of a data frame, it doesn't work on each individual row- it just copies the found value from the first row into subsequent rows. I always get the warning "longer object length is not a multiple of shorter object length", which I assume is the result of trying to take a column value, and trying to compare it to a database of about ~1000 values. I'm aware it's just a warning, but I'm guessing it's the reason my code glitches.

So far, I've tried to use apply and lapply for my function as other answers on the site have suggested, but I end up getting "unused argument (X[[i]])" (even though I define my other required arguments properly), and even then, I still get the "longer object length is not a multiple of shorter object length" in addition.

In other words, I want a table that produces this (values are just examples):

    Sales   CommIndexnum(Function applied to entire col)    Commission
    210000  1                                           25771
    210250  2                                           25901
    211000  3                                           26031

But currently I end up with this and the object length warning (values are just examples):

Sales   CommIndexnum(Function applied to entire col)    Commission
25000   1                                               25771
30000   1                                               25771(wrong return value)
35000   1                                               25771(wrong return value)

The root issue seems to be the lookup function so I've left out the value return function. Here is my main code:

# database call
Database <- read.csv("database.csv")


# lookup function that returns index number of commission
    commissionindexnum <- function(totalsales, minv, maxv) {
       which(totalsales >= minv & totalsales <= maxv)
    }       

# test data frame
Employee <-
  data.frame(
    Name = character(3),
    #sales amount used for lookup
    TotalSales = c(212000, 209000, 211000),
    #index number for the value to be used for commission
    CommissionIndexnum = double(3),
    #empty vector- lookup return values should go into the commission section
    Commission = double(3)
)

# errors appear here (database has ~1000 values- total sales amount would be searched for in it)
Employee[,3] <- commissionindexnum(Employee[,2], Database$Min, Database$Max),

Partial Database (as csv):

Min,Max,Commission
209740,210239,25771
210240,210739,25901
210740,211239,26031
211240,211739,26161
211740,212239,26291
212240,212739,26421
alistaire
  • 42,459
  • 4
  • 77
  • 117
Shan
  • 93
  • 1
  • 7
  • I'm a little confused and it's hard to know exactly what to do without a reproducible example. Can you please add at least a part of the Database data frame? – Oriol Mirosa Jul 29 '17 at 19:01
  • yes, give us a reproducible example, and you may want to review your tags as it's not about shiny. I think you may want to lookup `?Map` – moodymudskipper Jul 29 '17 at 19:09
  • Updated- I checked map. If I understand it correctly- it can take a list/column and will return the same if a function is used? – Shan Jul 29 '17 at 19:53
  • If I understand what you want, you could subset with [`findInterval`](https://stackoverflow.com/a/12979557/4497050). `Map` is a multivariate version of `lapply`. Sidenote: `Database` is not really a great name for a data.frame, as they're different things. – alistaire Jul 29 '17 at 20:40

1 Answers1

0

Found an answer thanks to alistaire's comment. Changing the original lookup function to use findInterval fixes all issues. Works using the min column (not max- returns the value above the actual answer).

commissionamt <- function(column, totalsales, minv) {
  column [findInterval(totalsales,  minv)] 
}

#database name kept for example consistency 
Employee[, 4] <-commissionamt(Database$Commission, Employee[, 2], Database$Min) 
Shan
  • 93
  • 1
  • 7