Is there a python package that can find the most impactful group (categorical features) from my data?

Question

My problem is that I have a dataset of our campaign like this:

| Customer | Province | District | City | Age | No. of Order |
| -------- | -------  | -------- | -----| ----| -------      |
| A        | P1       | D1       | C1   | 21  | 5            |
| B        | P2       | D2       | C2   | 22  | 9            |
....

And I need to find the most impactful group of customers (usually there will be >20 categorical groups). For example: "Customers from Province P1, District D1, Age 25 are the most promising group because they contributed 50% total order while being 10% of our customer base".

I'm currently using Pandas to loop through all the combinations of [2,3,4] from all my categorical features and calculate the sale proportion for each group but it is very time-consuming

I want to ask if there is already a Python package that can help to find that kind of group?

score 0 · Answer 1 · answered Jan 22 '21 at 07:44

0

You can automate that by using Decision Trees.

Not all features may be useful. Eliminate trivial ones using PCA (principal component analysis)

You may use scikit-learn package for both of above.

answered Jan 22 '21 at 07:44

Abhi25t

3,703
3
19
32

Is there a python package that can find the most impactful group (categorical features) from my data?

1 Answers1