6

I'm working on the Product Catalog module of an Invoicing application.

When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalog.

How can I store this "usage recency/frequency" in the database?

I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.

Can you help me what is the best practice for this kind of problem?

Adam
  • 4,985
  • 2
  • 29
  • 61
  • @downvoter, why is the downvote? – Adam Jun 22 '17 at 09:15
  • It seems to me that your aim is to track last used product and display new invoice pre-populated with most recently used product. Hence, only one instance of that information will be sufficient which can be stored at application level. – MKR Jun 24 '17 at 22:47
  • 3
    I did not down-vote it, but I am tempted to. The use of the "recency/frequency" has not been explained. For example: Which is more important -- the product bought 10 times a year ago, or the product bought only once, but yesterday? – Rick James Jun 26 '17 at 22:15
  • @RickJames, I do not really know, it is part of the question. This is why I am asking for best practices. – Adam Jun 27 '17 at 09:31

7 Answers7

6

Solution for the recency calculation:

Create a new column in the products table, named last_used_on for example. Its data type should be TIMESTAMP (the MySQL representation for the Unix-time).

Advantages:

  • Timestamps contains both date and time parts.
  • It makes possible VERY precise calculations and comparisons in regard to dates and times.
  • It lets you format the saved values in the date-time format of your choice.
  • You can convert from any date-time format into a timestamp.
  • In regard to your autocomplete fields, it allows you to filter the products list as you wish. For example, to display all products used since [date-time]. Or to fetch all products used between [date-time-1] and [date-time-2]. Or get the products used only on Mondays, at 1:37:12 PM, in the last two years, two months and three days (so flexible timestamps are).

Resources:

Solution for the usage rate calculation:

Well, actually, you are not speaking about a frequency calculation, but about a rate - even though one can argue that frequency is a rate, too.

Frequency implies using the time as the reference unit and it's measured in Hertz (Hz = [1/second]). For example, let's say you want to query how many times a product was used in the last year.

A rate, on the other hand, is a comparison, a relation between two related units. Like for example the exchange rate USD/EUR - they are both currencies. If the comparison takes place between two terms of the same type, then the result is a number without measurement units: a percentage. Like: 50 apples / 273 apples = 0.1832 = 18.32%

That said, I suppose you tried to calculate the usage rate: the number of usages of a product in relation with the number of usages of all products. Like, for a product: usage rate of the product = 17 usages of the product / 112 total usages = 0.1517... = 15.17%. And in the autocomplete you'd want to display the products with a usage rate bigger than a given percentage (like 9% for example).

This is easy to implement. In the products table add a column usages of type int or bigint and simply increment its value each time a product is used. And then, when you want to fetch the most used products, just apply a filter like in this sql statement:

SELECT
    id, 
    name, 
    (usages*100) / (SELECT sum(usages) as total_usages FROM products) as usage_rate
FROM products 
GROUP BY id
HAVING usage_rate > 9
ORDER BY usage_rate DESC;

Here's a little study case:

Study case

In the end, recency, frequency and rate are three different things.

Good luck.

4

To allow for future flexibility, I'd suggest the following additional (*) table to store the entire history of product usage by all users:

Name: product_usage

Columns:

  • id - internal surrogate auto-incrementing primary key
  • product_id (int) - foreign key to product identifier
  • user_id (int) - foreign key to user identifier
  • timestamp (datetime) - date/time the product was used

This would allow the query to be fine tuned as necessary. E.g. you may decide to only order by past usage for the logged in user. Or perhaps total usage within a particular timeframe would be more relevant. Such a table may also have a dual purpose of auditing - e.g. to report on the most popular or unpopular products amongst all users.

(*) assuming something similar doesn't already exist in your database schema

Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
3

Your problem is related to many other web-scale search applications, such as e.g. showing spell corrections, related searches, or "trending" topics. You recognized correctly that both recency and frequency are important criteria in determining "popular" suggestions. In practice, it is desirable to compromise between the two: Recency alone will suffer from random fluctuations; but you also don't want to use only frequency, since some products might have been purchased a lot in the past, but their popularity is declining (or they might have gone out of stock or replaced by successor models).

A very simple but effective implementation that is typically used in these scenarios is exponential smoothing. First of all, most of the time it suffices to update popularities at fixed intervals (say, once each day). Set a decay parameter α (say, .95) that tells you how much yesterday's orders count compared to today's. Similarly, orders from two days ago will be worth α*α~.9 times as today's, and so on. To estimate this parameter, note that the value decays to one half after log(.5)/log(α) days (about 14 days for α=.95).

The implementation only requires a single additional field per product, orders_decayed. Then, all you have to do is to update this value each night with the total daily orders:

orders_decayed = α * orders_decayed + (1-α) * orders_today.

You can sort your applicable suggestions according to this value.

stefan.schroedl
  • 866
  • 9
  • 19
2

To have an individual user experience, you should not rely on a field in the product table, but rather on the history of the user.

The occurrences of the product in past invoices created by the user would be a good starting point. The advantage is that you don't need to add fields or tables for this functionality. You simply rely on data that is already present anyway.

Since it is an auto-complete field, maybe past usage is not really relevant. Display n search results as the user types. If you feel that results are better if you include recency in the calculation of the order, go with it.

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
2

Now, implementation may defer depending on how and when product should be displayed. Whether it has to be user specific usage frequency or application specific (overall). But, in both case, I would suggest to have a history table, which later you can use for other analysis.

You could design you history table with atleast below columns:

Id  | ProductId | LastUsed (timestamp) | UserId 

And, now you can create a view, which will query this table for specific time range (something like product frequency of last week, last month or last year) and will give you highest sold product for specific time range.

Same can be used for User's specific frequency by adding additional condition to filter by Userid.

I'm thinking about adding a new field recency which would be increased by 1 every time the product was used, and decreased by 1/(count of all products), when an other product is used. Then use this recency field for ordering, but it doesn't seem to me the best solution.

Yes, it is not a good practice to add a column for this and update every time. Imagine, this product is most awaiting product and people love to buy it. Now, at a time, 1000 people or may be more requested for this product and for every request you are going to update same row, since to maintain the concurrency database has to lock that specific row and update for each request, which is definitely going to hit your database and application performance instead you can simply insert a new row.


The other possible solution is, you could use your existing invoice table as it will definitely have all product and user specific information and create a view to get frequently used product as I mentioned above.


Please note that, this is an another option to achieve what you are expecting. But, I would personally recommend to have history table instead.

Ravi
  • 30,829
  • 42
  • 119
  • 173
1

The scenario

When the user creates a new invoice the product name field should be an autocomplete field which shows the most recently used products from the product catalogue.

your suggested solution

How can I store this "usage recency/frequency" in the database?

If it is a web application, don't store it in a Database in your server. Each user has different choices.

Store it in the user's browser as Cookie or Localstorage because it will improve the User Experience.

If you still want to store it in MySQL table,

Do the following

  1. Create a column recency as said in question.

  2. When each time the item used, increase the count by 1 as said in question.

  3. Don't decrease it when other items get used.

  4. To get the recent most used item,

query

SELECT * FROM table WHERE recence = (SELECT MAX(recence) FROM table);

Side note

Go for the database use only if you want to show the recent most used products without depending the user.

Sagar V
  • 12,158
  • 7
  • 41
  • 68
1

As you aren't certain on wich measure to choose, and it's rather user experience related problem, I advice you have a number of measures and provide a user an option to choose one he/she prefers. For example the set of available measures could include most popular product last week, last month, last 3 months, last year, overall total. For the sake of performance I'd prefer to store those statistics in a separate table which is refreshed by a scheduled job running every 3 hours for example.

Serg
  • 22,285
  • 5
  • 21
  • 48