1

I am returning the ratings of features for a business by demographic and ratings conditions.

How would I sum the int64 values of the "count" column for all variables while including the variable names in the output e.g. Design 8,Food 1 ?

This question mentions coverting to index then selecting by index.
This question looks like a similar problem with SQL.

Currently, I can query by assigning the melt code-block to a variable prod then write something like this. prods[prods.rating == 2].

Example Data:

Customer Type    Age    Satisfaction    Design   Food    Wi-Fi    Service    Distance
     Disloyal     28   Not Satisfied         0      1        2          2        13.5
        Loyal     30       Satisfied         5      3        5          4        34.2
     Disloyal     36   Not Satisfied         2      0        2          4        55.8

# Cols I want to see the ratings for
ranked_cols = [
    "Design",
    "Food",
    "Wi-Fi",
    "Service",
]

# Select the relevant customers
sub = df[
    (df["Customer Type"] == "Disloyal")
    & (df["Satisfaction"] == "Not Satisfied")
    & df["Age"].between(30, 40)
]

(
    sub.melt(value_vars=ranked_cols)
    .groupby("variable")
    .value_counts()
    .to_frame()
    .reset_index()
    .rename(columns={"value": "rating", 0: "count"})
)

[Out]

    variable  rating  count
0   Design    2       5
1   Food      0       1 
2   Service   4       1
3   Wi-Fi     2       3
4   Design    1       3
Edison
  • 11,881
  • 5
  • 42
  • 50

1 Answers1

1
df.groupby("variable").sum()["count"]

Output:

variable
Design     8
Food       1
Service    1
Wi-Fi      3
Name: count, dtype: int64
965311532
  • 506
  • 3
  • 14
  • New [question](https://stackoverflow.com/questions/72631416/returning-column-values-used-in-a-condition-set-on-a-melt) – Edison Jun 15 '22 at 12:29