1

I would like to filter a datframe that has association rules results. I want antecedents that contain an element like H or L in my case. The antecedents are frozenset types. I tried Hrules but it is not working.

Hrules=fdem_rules['H'  in fdem_rules['antecedents']]
Hrules=fdem_rules[frozenset({'H'})  in fdem_rules['antecedents']] 

did not work

In the following example, I need only rows 46 and 89 as they have H.

df = pd.DataFrame({'antecedents': [frozenset({'N', 'M', '60'}), frozenset({'H', 'AorE'}), frozenset({'0-35', 'H', 'AorE', '60'}), frozenset({'AorE', 'M', '60', '0'}), frozenset({'0-35', 'F'})]})
             antecedents
75            (N, M, 60)
46             (H, AorE)
89   (0-35, H, AorE, 60)
103     (AorE, M, 60, 0)
38             (0-35, F)
mozway
  • 194,879
  • 13
  • 39
  • 75
Saif
  • 95
  • 8
  • please provide a minimal reproducible example – mozway Jan 05 '22 at 08:17
  • I provided 2 examples of antecedents frozensets, I want to filter out those rows where H does not appear in the antecedents. Pleaes let me know what you mean by reproducible example? – Saif Jan 05 '22 at 08:21
  • 1
    Well, if I were to copy your code, this wouldn't give me a dataframe but an error. I don't know what "fdem_rules is dataframe results of a apriori algo" is. A [minimal reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) means that I can reproduce your data: copy code -> paste -> get working object. – mozway Jan 05 '22 at 08:25
  • 1
    thank you, I edited it. I hope it is ok now. – Saif Jan 05 '22 at 08:42
  • I further simplified your question to only keep the minimal relevant data ;) – mozway Jan 05 '22 at 08:54

1 Answers1

1
set/frozenset methods

You can use apply with set/frozenset's method. Here to check is at least H or L is present, one can use the negation of {'H', 'L'}.isdisjoint:

match = {'H', 'L'}
df['H or L'] = ~df['antecedents'].apply(match.isdisjoint)

A much faster variant of the above is to use a list comprehension:

match = {'H', 'L'}
df['H or L'] = [not match.isdisjoint(x) for x in df['antecedents']]
explode+isin+aggregate

Another option is to explode the frozenset, use isin, and aggregate the result with groupby+any:

match = {'H', 'L'}
df['H or L'] = df['antecedents'].explode().isin(match).groupby(level=0).any()

output:

>>> df[['antecedents', 'H or L']]
             antecedents  H or L
75            (N, M, 60)   False
46             (H, AorE)    True
89   (0-35, H, AorE, 60)    True
103     (AorE, M, 60, 0)   False
38             (0-35, F)   False
slicing matching rows
match = {'H', 'L'}
idx = [not match.isdisjoint(x) for x in df['antecedents']]
df[idx]

output:

            antecedents consequents other_cols
46            (H, AorE)         (N)        ...
89  (0-35, H, AorE, 60)         (0)        ...

mozway
  • 194,879
  • 13
  • 39
  • 75