I have a PySpark data frame that looks like this:
----------------------------
id A B C
id1 on on on
id1 on off on
id1 on on on
id1 on on on
id1 on on off
-----------------------------
I am looking for a way to find all unique combinations for selected columns and show their count. The expected output:
----------------------------
id A B C count
id1 on on on 3
id1 on off on 1
id1 on on off 1
-----------------------------
I see that there is a way to do a similar operation in Pandas, but I need PySpark.
UPD: Also, please note that a unique combination of columns A and B is not the same as a combination of A,B,C. I want all possible combination of every column. Is there a way to achieve it rather than grouping by and counting one combination, another combination, etc.? There are more that 10 columns.