I recently learned about the np.select
operation and decided to create a class to experiment with it and also learn a bit more on OOP.
Here the definition of my class followed by an example (the class uses the translate
function defined at the beginning):
def translate(text, conversion_dict, before = None):
if not text: return text
before = before or str.lower
t = before(text)
for key, value in conversion_dict.items():
t = t.replace(key, value)
return t
class Conditions:
def __init__(self, base_conditions, composed_conditions, groups, default_group):
self.base_conditions = base_conditions
self.composed_conditions = composed_conditions
self.groups = groups
self.default_group = default_group
self.readable_conditions = [translate(c, self.base_conditions) for c in self.composed_conditions]
self.ok_conditions = []
def run_condition(self, condition, df_name):
return eval(condition.replace("(","("+str(df_name)+"."))
def run_conditions(self, df_name):
return [self.run_condition(c, df_name) for c in self.readable_conditions]
Example
First, we create a simple DataFrame to play with:
import pandas as pd
import numpy as np
example = {"lev1" : [-1, -1, -1, 1, 0 , -1, 0 , 3],
"lev2" : [-1, 0 , 1 , 5 , 0 , 7 , 8 , 6]}
ex_df = pd.DataFrame.from_dict(example)
print(ex_df)
lev1 lev2
0 -1 -1
1 -1 0
2 -1 1
3 1 5
4 0 0
5 -1 7
6 0 8
7 3 6
Next, we create a new instance of our class where we pass our conditions and groups:
mycond = Conditions({"(m1)" : "(lev1 < 0)",
"(m2)" : "(lev2 > 2)",
"(m3)" : "(lev1 == 0)"},
["(m1)", "(m2) & (m3)", "(m2)"],
['A', 'B', 'C'],
999)
Finally, we use the np.select
operation on our ex_df
DataFrame and print the result:
ex_df['MATCH'] = np.select(condlist = mycond.run_conditions("ex_df"),
choicelist = mycond.groups,
default = mycond.default_group)
print(ex_df)
lev1 lev2 MATCH
0 -1 -1 A
1 -1 0 A
2 -1 1 A
3 1 5 C
4 0 0 999
5 -1 7 A
6 0 8 B
7 3 6 C
As you can see, everything works well with one exception.
When I tried to import my class from a separate file (conditions.py
which also contains the translate function
) it does not work anymore. Here is how my folders/files are organized:
├── classes
│ ├── __init__.py
│ └── conditions.py
└── test-notebook.ipynb
In my test-notebook.ipynb
, I import my class the usual way (which works):
from classes.conditions import *
Then, after creating my DataFrame, I create a new instance of my class (that also works). Finally, when a run the np.select
operation this raises the following NameError: name 'ex_df' is not defined
.
I have no idea why this outputs an error and how to fix it. I'm looking for an answer on both the why and how. Here's the traceback of the error if needed:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-7-1d4b3ba4a3c0> in <module>
----> 1 ex_df['MATCH'] = np.select(condlist = mycond.run_conditions("ex_df"),
2 choicelist = mycond.groups,
3 default = mycond.default_group)
4 print(ex_df)
~/Projects/test/notebooks/classes/conditions.py in run_conditions(self, df_name)
20
21 def run_conditions(self, df_name):
---> 22 return [self.run_condition(c, df_name) for c in self.readable_conditions]
~/Projects/test/notebooks/classes/conditions.py in <listcomp>(.0)
20
21 def run_conditions(self, df_name):
---> 22 return [self.run_condition(c, df_name) for c in self.readable_conditions]
~/Projects/test/notebooks/classes/conditions.py in run_condition(self, condition, df_name)
17
18 def run_condition(self, condition, df_name):
---> 19 return eval(condition.replace("(","("+str(df_name)+"."))
20
21 def run_conditions(self, df_name):
~/Projects/test/notebooks/classes/conditions.py in <module>
NameError: name 'ex_df' is not defined