0

My problem is that I have a dataset of our campaign like this:

| Customer | Province | District | City | Age | No. of Order |
| -------- | -------  | -------- | -----| ----| -------      |
| A        | P1       | D1       | C1   | 21  | 5            |
| B        | P2       | D2       | C2   | 22  | 9            |
....

And I need to find the most impactful group of customers (usually there will be >20 categorical groups). For example: "Customers from Province P1, District D1, Age 25 are the most promising group because they contributed 50% total order while being 10% of our customer base".

I'm currently using Pandas to loop through all the combinations of [2,3,4] from all my categorical features and calculate the sale proportion for each group but it is very time-consuming

I want to ask if there is already a Python package that can help to find that kind of group?

tripleee
  • 175,061
  • 34
  • 275
  • 318
HDS2002
  • 1
  • 2

1 Answers1

0

You can automate that by using Decision Trees.

Not all features may be useful. Eliminate trivial ones using PCA (principal component analysis)

You may use scikit-learn package for both of above.

Abhi25t
  • 3,703
  • 3
  • 19
  • 32