1

I recently learned about the np.select operation and decided to create a class to experiment with it and also learn a bit more on OOP.

Here the definition of my class followed by an example (the class uses the translate function defined at the beginning):

def translate(text, conversion_dict, before = None):
    if not text: return text
    before = before or str.lower
    t = before(text)
    for key, value in conversion_dict.items():
        t = t.replace(key, value)
    return t

class Conditions:
    def __init__(self, base_conditions, composed_conditions, groups, default_group):
        self.base_conditions = base_conditions
        self.composed_conditions = composed_conditions
        self.groups = groups
        self.default_group = default_group
        self.readable_conditions = [translate(c, self.base_conditions) for c in self.composed_conditions]
        self.ok_conditions = []  

    def run_condition(self, condition, df_name):
        return eval(condition.replace("(","("+str(df_name)+"."))

    def run_conditions(self, df_name):
        return [self.run_condition(c, df_name) for c in  self.readable_conditions]

Example

First, we create a simple DataFrame to play with:

import pandas as pd
import numpy as np

example = {"lev1" : [-1, -1, -1, 1, 0 , -1, 0 , 3],
           "lev2" : [-1, 0 , 1 , 5 , 0 , 7 , 8 , 6]}

ex_df = pd.DataFrame.from_dict(example)
print(ex_df)

   lev1  lev2
0    -1    -1
1    -1     0
2    -1     1
3     1     5
4     0     0
5    -1     7
6     0     8
7     3     6

Next, we create a new instance of our class where we pass our conditions and groups:

mycond = Conditions({"(m1)" : "(lev1 < 0)",
                     "(m2)" : "(lev2 > 2)", 
                     "(m3)" : "(lev1 == 0)"},
                    ["(m1)", "(m2) & (m3)", "(m2)"],
                    ['A', 'B', 'C'],
                    999)

Finally, we use the np.select operation on our ex_df DataFrame and print the result:

ex_df['MATCH'] = np.select(condlist = mycond.run_conditions("ex_df"), 
                           choicelist = mycond.groups, 
                           default = mycond.default_group) 
print(ex_df)

   lev1  lev2 MATCH
0    -1    -1     A
1    -1     0     A
2    -1     1     A
3     1     5     C
4     0     0   999
5    -1     7     A
6     0     8     B
7     3     6     C

As you can see, everything works well with one exception.

When I tried to import my class from a separate file (conditions.py which also contains the translate function) it does not work anymore. Here is how my folders/files are organized:

├── classes
│   ├── __init__.py
│   └── conditions.py
└── test-notebook.ipynb

In my test-notebook.ipynb, I import my class the usual way (which works):

from classes.conditions import *

Then, after creating my DataFrame, I create a new instance of my class (that also works). Finally, when a run the np.select operation this raises the following NameError: name 'ex_df' is not defined.

I have no idea why this outputs an error and how to fix it. I'm looking for an answer on both the why and how. Here's the traceback of the error if needed:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-1d4b3ba4a3c0> in <module>
----> 1 ex_df['MATCH'] = np.select(condlist = mycond.run_conditions("ex_df"), 
      2                            choicelist = mycond.groups,
      3                            default = mycond.default_group) 
      4 print(ex_df)

~/Projects/test/notebooks/classes/conditions.py in run_conditions(self, df_name)
     20 
     21     def run_conditions(self, df_name):
---> 22         return [self.run_condition(c, df_name) for c in  self.readable_conditions]

~/Projects/test/notebooks/classes/conditions.py in <listcomp>(.0)
     20 
     21     def run_conditions(self, df_name):
---> 22         return [self.run_condition(c, df_name) for c in  self.readable_conditions]

~/Projects/test/notebooks/classes/conditions.py in run_condition(self, condition, df_name)
     17 
     18     def run_condition(self, condition, df_name):
---> 19         return eval(condition.replace("(","("+str(df_name)+"."))
     20 
     21     def run_conditions(self, df_name):

~/Projects/test/notebooks/classes/conditions.py in <module>

NameError: name 'ex_df' is not defined
glpsx
  • 587
  • 1
  • 7
  • 21

2 Answers2

1

I think this will solve the problem

FIRST FILE NAME Stackoverflow2.py

import pandas as pd
import numpy as np


def translate(text, conversion_dict, before = None):
    if not text: return text
    before = before or str.lower
    t = before(text)
    for key, value in conversion_dict.items():
        t = t.replace(key, value)
    return t

class Conditions:
    def __init__(self, base_conditions, composed_conditions, groups, default_group):
        self.base_conditions = base_conditions
        self.composed_conditions = composed_conditions
        self.groups = groups
        self.default_group = default_group
        self.readable_conditions = [translate(c, self.base_conditions) for c in self.composed_conditions]
        self.ok_conditions = []  

    def run_condition(self, condition, df_name):
        return eval(condition.replace("(","("+str(df_name)+"."))

    def run_conditions(self, df_name):
        return [self.run_condition(c, df_name) for c in  self.readable_conditions]

class DataFrame(Conditions):

    def __init__(self):
        pass
    def makeDataFrame(self):

        example = {"lev1" : [-1, -1, -1, 1, 0 , -1, 0 , 3],
        "lev2" : [-1, 0 , 1 , 5 , 0 , 7 , 8 , 6]}

        ex_df = pd.DataFrame.from_dict(example)

        return ex_df



obj=DataFrame()

print(obj.makeDataFrame())

# mycond = Conditions({"(m1)" : "(lev1 < 0)",
#                      "(m2)" : "(lev2 > 2)", 
#                      "(m3)" : "(lev1 == 0)"},
#                     ["(m1)", "(m2) & (m3)", "(m2)"],
#                     ['A', 'B', 'C'],
#                     999)

# ex_df['MATCH'] = np.select(condlist = mycond.run_conditions("ex_df"), 
#                            choicelist = mycond.groups, 
#                            default = mycond.default_group) 
# print(ex_df)

SECOND FILE NAME: Stackoverflow3.py

from Stackoverflow2 import *

print(obj.makeDataFrame())

The problem is the global variables in python does work as in c or c++. So instead make them an instance variables.

for further info check this

https://stackoverflow.com/questions/15959534/visibility-of-global-variables-in-imported-modules
Alex Ferguson
  • 107
  • 2
  • 10
1

In your context, the variable ex_df is not accessible as a global or local variable, ie, the function run_conditions knows the name "ex_df" but doesn't know what it is.

You need to pass the reference to the DataFrame instead of its name:

ex_df['MATCH'] = np.select(condlist = mycond.run_conditions(ex_df), 
                           choicelist = mycond.groups, 
                           default = mycond.default_group)

Then change the definition of run_condition to accept the DataFrame instead of a variable name:

def run_condition(self, condition, df):
    return eval(condition.replace("(","(df."))

def run_conditions(self, df):
    return [self.run_condition(c, df) for c in  self.readable_conditions]

Explanation: Inside the context of the function run_condition, the variable name is df. There is no "ex_df" - that is just how you called it somewhere else. Running eval() at that point, the interpreter knows df by that name, which is an argument to the function.

augustomen
  • 8,977
  • 3
  • 43
  • 63