-1

Background: I was hoping to generate a new column named: datasample based on another column named: end_bin from a table.

Question: Is there a way to return the max value in each row of the new column if the value is repeated in the previous column.

Expected result:

end_bin datasample
6 1
8 1
10 1
2 3
3 1
2 3
2 3

I couldnt find a method to do this in pandas, any help is appreciated:)

  • Can you explin your ouput? Why is first value `1` ? – jezrael Mar 29 '22 at 09:41
  • So the first value is simply the number of occurances of that value in end_bin. i.e 6 occured just once in end_bin and so did others except 2 which occured 3 times in total across end_bin. Hence 3 is displayed across all rows with corresponding value as 2. – Pranav Arora Mar 29 '22 at 09:48
  • I hope this is what you are looking for. _df = pd.DataFrame(data={"end_bin": [6, 8, 10, 2, 3, 2, 2]}) count_ser = _df.value_counts() _df["datasample"] = _df["end_bin"].replace(count_ser) _df – JAbr Mar 29 '22 at 10:04

1 Answers1

1

Your question is unclear, but it looks like you want the size per group:

df['datasample'] = df.groupby('end_bin')['end_bin'].transform('size')

Output:

   end_bin  datasample
0        6           1
1        8           1
2       10           1
3        2           3
4        3           1
5        2           3
6        2           3
mozway
  • 194,879
  • 13
  • 39
  • 75