I am trying to write a simple record classifier. I want to add a column whose value classifies a record. I want to codify my classification rules in a yaml, or similar file for maintenance purposes.
I am using Pandas as that seems to be the best way to do this with csv records in python. I am open to other suggestions. I am new to pandas and my python skills are politely described as "why does this look like perl?"
I've gotten a dataframe (trans) and I want to apply my rules as follows:
trans['class'][(trans['foo'] > 5) & (trans['bar'].str.contains(re.compile('baz|one|two', re.I))] = 'Record Type 1'
This works interactively. I would like to be able to generate the classifying index, "(trans['foo'] > 5) & (trans['bar'].str.contains(re.compile('baz|one|two', re.I))"
dynamically from each rule in my yaml file. I have successfully built strings such that I have things like:
slice = "(trans['foo'] > 5) & (trans['bar'].str.contains(re.compile('baz|one|two', re.I))"
trans['class'][slice] = 'Record Type 1'
This doesn't work. What should I be doing instead?