8

I have been Working on algorithms and formulas to find out a score for the products available on my ecommerce website.Basically, I want to calculate some kind of score to rank the products when a user searches it. I'll give some background on the criteria i am planning to define to calculate the ranking:

  • Product Clicks
  • Product views
  • Product Conversions
  • Product Rating(given by users)
  • Relevance to the search string provided by user

Ideally i want an algorithm where i can calculate all the scores and rank the product accordingly.I have all the datas available with me,but i am confused about how much weightage i should give to all the parameters i have described above?

Any help will be appreciated !!

Thanks in advance.

EDIT: I am planning to assign the weightage for each of the parameters as follows:

  • Product Clicks(CTR) : 1.0
  • Product Views : 1.5
  • Product Conversions : 4.0
  • Product Rating : 2.0

What could be the formula which can be used to calculate the score ?

  • Well, that's going to be domain specific, and purpose specific. Voting to close as subjective. – amit May 12 '15 at 12:24
  • I am confused about the weightage i need to give to the different parameters i am considering here and also how to normalize all the scores and come up with an unbiased ranking. @amit –  May 12 '15 at 12:32
  • This is going to depend heavily on what you are trying to do, and there is no one definite answer for it. – amit May 12 '15 at 12:34
  • @amit I basically want to assign a score to all the products available, on the basis of the parameters i have specified.So that when a user searches for a product on the website it returns the most relevant products based on the score i have assigned.The dilemma i am facing here is which factor should be given how much weightage.I know it sounds little vague but even a approx formula will work for the time being. –  May 12 '15 at 12:42
  • @user4115825 How did you solve the problem. I am working on similar problem statement and also in the same situation. Can you please help with your solution. – Jack Daniel Sep 03 '20 at 16:30

2 Answers2

3

You can set this problem up as a prediction, or Learning-to-Rank problem. First, you want to define an objective function. A reasonable assumption is that ultimately you want to make it as easy as possible for users to buy your products, which means you want to rank those products as high as possible that they are most likely to purchase. The notion of "as high as possible" can be made precise by one of the known rank measures (see reference), such as normalized discounted cumulative gain (nDCG) or mean reciprocal rank (MRR) of a purchase. Ranking products according to a statistical model that predicts conversion rates or probability of purchase will lead you towards this goal.

Now, for a moment let's make the following simplifying assumptions:

  1. There are no queries (i.e., every user sees the same list).
  2. Each day, sales are exactly the same for every product as on the previous day.
  3. Each product is bought at least once a day.
  4. Users look at every item in the result list, then decide if and what to buy.

Under these conditions, ranking by the previous day's sales would always be perfect.

Of course, we have simplified too much.

  1. As the ubiquitous financial disclaimer says, "past performance is not necessarily indicative of future results". Sales change seasonally, weekly, and just randomly.
  2. Many (usually most) sales data is sparse; especially new products have no data at all, so we need to rely on other information.
  3. The user expresses her intent by typing a query. Ideally, we could reduce this aspect by remembering sales numbers for each query separately; in practice, however, this would hugely exacerbate the data sparsity problem, see 2).

Therefore, you want to rank by a function of input features (among them, yesterday's product conversions) to predict today's product conversions as accurately as possible. This function can be as simple as a weighted sum of features, as you propose, or as complex as a deep neural net. What is common among them is how to figure out the model parameters: Collect training data at the end of day d: feature values from day d-1, and the conversions observed on day d. The latter is our ground truth, but we pretend we don't know it and try to predict it based on the former alone, e.g., by means of a linear regression. Doing so, features other than (previous) sales will turn out to be useful, to combat sparsity.

Obviously, I have only scratched the surface. There are many aspects and refinements; for example, Assumption 4.) above is clearly unrealistic. Due to limited attention, users only look at the top-most results, which leads to so-called position bias.

Hopefully, however, this brief summary will point you in the right direction.

stefan.schroedl
  • 866
  • 9
  • 19
2

Taking into account the rating of the product makes it a more difficult calculation because the number of reviews will always be substantially lower than the number of view/sales/... so the product reviews could have a bigger impact on the product score than you would want.

Maybe this paper helps: http://web.engr.oregonstate.edu/~cscaffid/papers/eu_20070611_redopal.pdf

Thomas Theunen
  • 1,244
  • 9
  • 13