0

I know there have been several questions asked regarding using nunique() with groupby, but I get a different error message. The following code works fine:

d = {'PDOS': [1, 1, 2, 2], 'CPT' : ["123", "123", "456", "456"], 'BC': ["A", "A", "A", "B"], 'Other': [1,2,3,4]}
df = pd.DataFrame(data = d)

df = df.merge(df[["PDOS", "CPT", "BC"]].groupby(["PDOS", "CPT"]).count(), how = "left", on = ["PDOS", "CPT"])

and I get a new column ("BC_y") which gives me the number of BC rows per each group.

however, if I change the last line to use nunique() as follows:

df = df.merge(df[["PDOS", "CPT", "BC"]].groupby(["PDOS", "CPT"]).nunique(), how = "left", on = ["PDOS", "CPT"])

I get the error message:

You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

I think the reason may be when using nunique() the columns "PDOS" and "CPT" show up as the index as well as columns.

How can I get the number of unique values instead of the number of rows per group?

Yuca
  • 6,010
  • 3
  • 22
  • 42
SM_Erd
  • 57
  • 1
  • 7
  • can you elaborate on what you're trying to achieve? the whole issue here is the merge because it is as you say, you have PDOS and CPT bot as index and column names – Yuca Feb 13 '19 at 21:37
  • I just want a new column added to the existing dataframe that gives me the number of unique values in column "BC" for every group of "PDOS" and "CPT". So for the first two rows the value would be 1 ("A" and "A") and for the second two rows the value would be two ("A" and "B"). Let me know if you need additional clarifications. – SM_Erd Feb 13 '19 at 21:40
  • Possible duplicate of [Apply vs transform on a group object](https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object) – ALollz Feb 13 '19 at 21:41
  • I think you are looking for `df.groupby(["PDOS", "CPT"]).transform(pd.Series.nunique)` and concatenate it back? – ALollz Feb 13 '19 at 21:42
  • can you tell me if this is what you want? `df.merge(df[["PDOS", "CPT", "BC"]].groupby(["PDOS", "CPT"])[['BC']].nunique(), how = "left", on = ["PDOS", "CPT"])`? – Yuca Feb 13 '19 at 21:43
  • Perfect, thanks! Worked! So this was the way to get around the fact that the indices were used as columns? – SM_Erd Feb 13 '19 at 21:47
  • Are you talking to Allolz or me? haha – Yuca Feb 13 '19 at 21:56
  • Talking about Yuca. Thanks to Allolz for the help as well. – SM_Erd Feb 13 '19 at 22:01
  • got it. There are many ways to work around that, you could also use reset_index :) – Yuca Feb 13 '19 at 22:03

0 Answers0