1

I have a dataframe where many rows belong to one entity but are split by a scenario. For example:

df = pd.DataFrame({"Entity":['AAA', 'AAA, 'AAA, 'BBB','BBB', 'CCC'],
                "Scenario":['x', 'y', 'z', 'y', 'z', 'x'],
                "val_2018":[100, 200, 100, 0, 3, 40],
                "val_2019":[150, 150, 120, 0, 0, 60]})

I would like a dataframe showing me whether the scenario exists for each entity on one row. So the output for the above would be:

Entity    x    y    z
AAA    True   True   True
BBB    False  True   True
CCC    True   False  False

Or instead of boolean T/F - 1 and 0 would be even better - essentially a count.

So far I have tried:

table = pd.pivot_table(data=df, columns=["Scenario"], index="Entity", aggfunc='count', fill_value=0)

Although it almost gives me what I require, it repeats the values for every other column - like a multiIndex. So val_2018 and val_2019 have the scenario repeated.

Unfortunately I can't just drop these as I have many other columns I haven't specified in the example above.

Does anyone know how I can do this succinctly?

Stu
  • 115
  • 7

1 Answers1

2

Try pd.crosstab:

print(pd.crosstab(df["Entity"], df["Scenario"]))

Prints:

Scenario  x  y  z
Entity           
AAA       1  1  1
BBB       0  1  1
CCC       1  0  0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91