1

I have a dataframe containing a set of columns all of which are booleans. I would like to get a unique count of another column (an ID column) for each of these columns (and respective boolean values)

ex:

data = [
    ["ABC", True, False],
    ["DEF", False, False],
    ["GHI", False, True],
]

df = pd.DataFrame(data, columns=["ID", "CONTAINS_Y", "CONTAINS_X"])

df.groupby("CONTAINS_Y")["ID"].nunique().reset_index()

yields:

  CONTAINS_Y    ID
    False       2
    True        1

I would like a command that yields:

  CONTAINS_Y    ID
    False       2
    True        1
  CONTAINS_X    ID
    False       2
    True        1

Is there a way to group all boolean columns to get unique counts? Or does it have to be done individually?

There are other similar questions (Pandas count(distinct) equivalent) but the solutions haven't worked for what I am trying to achieve.

user
  • 651
  • 10
  • 22

1 Answers1

1

First melt then groupby

df.melt(id_vars='ID').groupby(['variable', 'value']).nunique()

#                  ID
#variable   value    
#CONTAINS_X False   2
#           True    1
#CONTAINS_Y False   2
#           True    1

melt reshapes your DataFrame from a wide to a long format so that it will work with a single groupby

df.melt(id_vars='ID')
#    ID    variable  value
#0  ABC  CONTAINS_Y   True
#1  DEF  CONTAINS_Y  False
#2  GHI  CONTAINS_Y  False
#3  ABC  CONTAINS_X  False
#4  DEF  CONTAINS_X  False
#5  GHI  CONTAINS_X   True
ALollz
  • 57,915
  • 7
  • 66
  • 89