0

From my analysis I have discovered that Disloyal 30-40 year old customers are Not Satisfied with Company X. "Not Satisfied" means they have rated services and products 0-2 out of a possible 5. I want to know what inputs were ranked <=2.

I stored the columns in a list to use in a for loop so I could index the relevant column values which are rankings 0-5.

What is the syntax for using the column variable in the boolean expression?

Example Data:

Customer Type    Age    Satisfaction    Design   Food    Wi-Fi    Service    Distance
     Disloyal     28   Not Satisfied         0      1        2          2        13.5
        Loyal     30       Satisfied         5      3        5          4        34.2
     Disloyal     36   Not Satisfied         2      0        2          4        55.8

Code

ranked_cols = ['Design', 'Food', 'Wi-Fi', 'Service', 'Distance']

for column in df[ranked_cols]:
    columnSeriesObj = df[column]

sub = df[
(df["Customer Type"] == "Disloyal")
& (df["Satisfaction"] == "Not Satisfied")
& df["Age"].between(30, 40)
]

sub[(sub[ranked_cols] <= 2)].shape[0]

(sub.melt(value_vars=[c for c in sub.columns if c.startswith(column)])
.groupby("variable")
.value_counts()
.to_frame()
.reset_index()
.rename(columns={0: "count"}))
Edison
  • 11,881
  • 5
  • 42
  • 50

1 Answers1

1

Try this:

# Choose the cols you want to see the ratings for
ranked_cols = [
    "Design",
    "Food",
    "Wi-Fi",
    "Service",
]

# Select the relevant customers
sub = df[
    (df["Customer Type"] == "Disloyal")
    & (df["Satisfaction"] == "Not Satisfied")
    & df["Age"].between(30, 40)
]

(
    sub.melt(value_vars=ranked_cols)
    .groupby("variable")
    .value_counts()
    .to_frame()
    .reset_index()
    .rename(columns={"value": "rating", 0: "count"})
)

This will output a DataFrame contaning all the ranked_cols categories and their respective rating and how many times that rating was given (count):

    variable  rating  count
0   Design    2       1
1   Food      0       1 
2   Service   4       1
3   Wi-Fi     2       1
965311532
  • 506
  • 3
  • 14
  • Thanks again ;) Hey, it prints a massive list because of all the different counts and variables. Is there a way for the print to only output what I need with totals to confirm those total customers? – Edison Jun 10 '22 at 12:26
  • You can assign it to a variable and then use boolean indexing on it if you need to select only certain products, for example if we assingn the above to the variable `prods` ,`prods[prods.variable == "Design"]` will print only the rows that have the `Design` variable – 965311532 Jun 10 '22 at 12:30
  • You can do `prods[prods.rating <=2]` – 965311532 Jun 10 '22 at 13:07
  • If you're up for a challenge, here's a new [related question](https://stackoverflow.com/questions/72640322/return-customer-age-segments-within-product-ranking-groupings). – Edison Jun 16 '22 at 04:06