2

I'd like to first provide a little bit of context.

I have a dataframe that looks like:

ID   Q1  Q2  Q3
A    Y   N   N
A    N   N   Y               
A    N   N   N    
B    Y   Y   N
C    N   N   Y
C    N   N   N
D    N   N   N
D    N   N   Y
D    N   Y   N
E    N   N   N
E    N   Y   N
E    N   N   N

So, there are 4 items: A, B, C, and D. I'd like to create a class with nested classes (or attributes) that can go column by column and creates splits of the items based on whether the value is Y or N (one Y makes the split Y). For example, let's say the first split is done by Q1, then: A goes with B in the split Y, and C goes with D and E in the split N. We can further split these two using Q2; then A and B would be in the N and Y split respectively. With Q2, C would go to the N split, and E and D would go the Y. Then, Q3 is only needed to create a split of D and E because all the other items are alone. Using Q3 then D goes to Y and E goes to N.

Following this procedure generates a tree structure like this:

        Initial
       /       \     (Using Q1)
      N        Y     N: C,D,E -- Y: A,B
     / \      / \    (Using Q2)
    N   Y    N   Y   NN: C - NY: D,E -- YN: A - YY: B 
       / \           (Using Q3)
      N   Y          NYN: E - NYY: D

So, what I would like is to create a class that automatically divides the items using the columns until they are singled out. This requires nested classes or attributes. I imagine something like all, then all.Q1N and all.Q1Y, then all.Q1N.Q2Y, and so on. At the very end (the tree leaves), I want to count how many instance of the item there are. For example. all.Q1N.Q2N.values = 2 since there are two rows with C on them.

I have no idea if this is possible with python, and if it is, I have no idea how to do it. I've been searching, but haven't quite found anything that I can use. I'd appreciate if someone can tell me how feasible this is in python, and if it is, if they pinpoint some resource (special function decorator) that can be used to accomplish this. I am not expecting someone to write the code for this (although I wouldn't be angry if someone did); instead I just want to know what to use and then do it. I'll post the code here if I can manage to do it.

Schach21
  • 412
  • 4
  • 21
  • 2
    What you want to accomplish can probably be done in Python via nested dictionaries. See [What is the best way to implement nested dictionaries?](https://stackoverflow.com/questions/635483/what-is-the-best-way-to-implement-nested-dictionaries) – martineau Apr 03 '22 at 16:50

1 Answers1

0

You can use a recursive function to build the tree by partitioning your available items on 'Y' and 'N', and then calling the function on non-singleton groups:

from collections import defaultdict
def to_tree(vals, d, l = 0): # signature stores a running level counter (l)
   t = defaultdict(list)
   for i in vals:
      # check if (d) stores a 'Y' value for the given (i) value at current level (l)
      t[['N', 'Y'][any(j[l] == 'Y' for j in d[i])]].append(i) #append (i) to 'Y' partition of above condition is met, else append to 'N'
   # apply recursion to partitions if the partition contains more than one value
   return {a:b[0] if len(b) == 1 else to_tree(b, d, l + 1) for a, b in t.items()}    

import json
data = [['A', 'Y', 'N', 'N'], ['A', 'N', 'N', 'Y'], ['A', 'N', 'N', 'N'], ['B', 'Y', 'Y', 'N'], ['C', 'N', 'N', 'Y'], ['C', 'N', 'N', 'N'], ['D', 'N', 'N', 'N'], ['D', 'N', 'N', 'Y'], ['D', 'N', 'Y', 'N'], ['E', 'N', 'N', 'N'], ['E', 'N', 'Y', 'N'], ['E', 'N', 'N', 'N']]
d = defaultdict(list)
for a, *b in data:
    d[a].append(b)

print(json.dumps(to_tree([*d], d), indent=4))

Output:

{
    "Y": {
        "N": "A",
        "Y": "B"
    },
    "N": {
        "N": "C",
        "Y": {
            "Y": "D",
            "N": "E"
        }
    }
}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102