2

Below table is an example of my dataframe. I am using the lifetimes package

Organization id Lifetime_orders DOY(most recent date by day) Tenure
54302 22 69 43
32453 4 72 44

This is the code Im using to run the bgModel within the lifetimes package

from lifetimes.plotting import plot_period_transactions, plot_frequency_recency_matrix, plot_probability_alive_matrix, plot_history_alive

bgModel = BetaGeoFitter()
bgModel.fit(new_df['lifetime_orders'], df['tenure'], df['DOY'])

bgModel.summary

But when I run the code I get a ValueError: Some values in recency vector are larger than T vector.

I expected a summary table showing coefficient and upper and lower bounds. I tried changing the data types to float, object and int but that did not work. I looked at the packages git https://github.com/CamDavidsonPilon/lifetimes/blob/master/lifetimes/utils.py but It was not very helpful

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52

1 Answers1

1

Check out the definitions for Frequency, Recency and Tenure (T) in the documentation - they might not be what you think:

For all models, the following nomenclature is used:

  • frequency represents the number of repeat purchases the customer has made. This means that it’s one less than the total number of purchases. This is actually slightly wrong. It’s the count of time periods the customer had a purchase in. So if using days as units, then it’s the count of days the customer had a purchase on.
  • T represents the age of the customer in whatever time units chosen ... This is equal to the duration between a customer’s first purchase and the end of the period under study.
  • recency represents the age of the customer when they made their most recent purchases. This is equal to the duration between a customer’s first purchase and their latest purchase. (Thus if they have made only 1 purchase, the recency is 0.)
  • monetary_value represents the average value of a given customer’s purchases. This is equal to the sum of all a customer’s purchases divided by the total number of purchases. Note that the denominator here is different than the frequency described above.

Your DOY looks a bit like recency, but note that for this model, recency is the age of the customer when they made their most recent purchase. T is the age of the customer today, so if you've calculated those both correctly, recency cannot be greater than T, which is what the ValueError is telling you.

For example, if I first shopped 10 days ago, and last shopped 3 days ago, my recency is 7, and my T is 10.

Also, check the order of your arguments - it should be bgModel.fit(data['frequency'], data['recency'], data['T']) - it looks like you have tenure(T) and recency swapped.

Finally, it's worth pointing out the lifetimes package is now in maintenance mode, see the readme. There is a successor in PyMC-marketing, but note this package is relatively early in it's development cycle (v.0.0.4 at the time of writing this).

s_pike
  • 1,710
  • 1
  • 10
  • 22
  • Thank you for your response I had to filter out all the values where recency was greater than T and when recency was negative but then I stumbled upon more errors which im going to assume is due to the maintenance – Yiga Ngbogbara Mar 16 '23 at 15:41