I have a dataframe (df) that looks like this:
visitor label response products number pct
abc123 color blue 3 1 0.333
def456 size 4 x 4 5 5 1.0
def456 shape round 5 5 1.0
I created a new dataframe that only has records where pct = 1.0 with this code:
df2 = df
df2["pct"] = 1.0
And then I'd like to count the number of records by visitor with a new column called num_same. I tried this:
df2["num_same"] = df2.groupby("visit_id").aggregate('count')
But my results are all NaN.
I've also tried:
df2["num_same"] = df2.groupby("visit_id").size()
How can I get the results I want without the NaNs?