Before I start, I am not sure which terminology can I use, so please dont be harsh with mine misuse of terms like unique and duplicate.
Pandas dataset has three columns, A B and C. Rows are considered the same if they have either of the columns, A, B or C the same. If we have this table:
row num | A | B | C |
---|---|---|---|
1 | A1 | B1 | C1 |
2 | A1 | B2 | C2 |
3 | A2 | B2 | C3 |
4 | A3 | B3 | C3 |
row 1 and 2 are the same because column A is the same, row 2 and 3 because of B, and row 3 and 4 because of C. This would mean that since 1 is duplicate of 2 which is duplicate of 3 which is duplicate of 4, I expect the number of unique records here is 1.
How would I write python pandas code to calculate that?