0

I'm a bit new to this so please be gentle. I have a dataframe structured like the table below and I'd like to groupby column "P" and make new columns for the distinct/unique values in column "U" and then count the instances of those values.

P U
p1 u1
p1 u1
p1 u3
p2 u1
p2 u2
p2 u3

Essentially I'd like the output to look like this.

P u1 u2 u3
p1 2 0 1
p2 1 1 1

I guess I'm not sure how to articulate what it is I'm trying to do or what the terminology is to do a Google search to figure out myself, so perhaps someone can describe the pandas/python method that's best used for what I'm looking for I could look up examples myself. Thanks!

kkhazae
  • 1
  • 1

1 Answers1

1

You can use unstack() after groupby("P") and count the values in column U .

import pandas as pd
import io

s = '''P    U
p1  u1
p1  u1
p1  u3
p2  u1
p2  u2
p2  u3'''

df = pd.read_csv(io.StringIO(s), sep = "\s+")
df.groupby("P")["U"].value_counts().unstack(fill_value = 0)

# 
U   u1  u2  u3
P           
p1  2   0   1
p2  1   1   1

Notes that adding fill_value = 0 in unstack() can replace the missing values to the given value.

# Without fill_value
df.groupby("P")["U"].value_counts().unstack()

#
U   u1  u2  u3
P           
p1  2.0 NaN 1.0
p2  1.0 1.0 1.0
Denny Chen
  • 489
  • 3
  • 8
  • Hey, thanks for your answer! I'll try it out. But I was wondering why I'd use this over the recommended pd.crosstab()? – kkhazae Apr 13 '22 at 16:45
  • @kkhazae I think this [post](https://stackoverflow.com/questions/36267745/how-is-a-pandas-crosstab-different-from-a-pandas-pivot-table) will answer your question. – Denny Chen Apr 14 '22 at 04:44
  • Basically, a simpler code is better and I would also recommend using `crosstab()` due to such simple data or situation. But when the data comes bigger, the speed issue will also need to be concerned. If you need external processes or you are working in a collaborative way with others, there are more issues need to be dealed with, too. Unfortunately, I am not an expert in the field of computer science or a pandas module master, so it is all that I can tell you. And the answer is just provided as an alternative. – Denny Chen Apr 14 '22 at 04:51