I need to change the value of a group label of rows if they do not have enough points. For example,
+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |5 |
|E |6 |
|W |1 |
+-----+
If I were to group on the value within c2 and the minimum number of points within each group has to be greater than or equal to 2.
c2:
1 : count(c1) = 3
2 : count(c1) = 2
5 : count(c1) = 1
6 : count(c1) = 1
Clearly, groups 5 and 6 have only 1 element in each so then I would like to relabel those row's c2 values to -1.
This can be seen below.
+-----+
|c1|c2|
+-----+
|A |1 |
|A |2 |
|B |1 |
|A |2 |
|E |-1|
|E |-1|
|W |1 |
+-----+
This is the code I have written, however it is not updating the dataframe.
labels = df["c2"].unique()
for l in labels:
group_size = df[DB["c2"]==l].shape[0]
if group_size<=minPts:
df[df["c2"]==l]["c2"] = -1