1

I have a dataset "banks" where if I do a groupby on a column name "jobs" to check counts in each category,I could find the following:

index jobs count
0 adnin. 478
1 blue-collar 946
2 entrepreneur 168
3 housemaid 112
4 management 969
5 retired 230
6 self-employed 183
7 services 417
8 student 84
9 technician. 768

I've also added first 3 lines of the dataset I am using: age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y 30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no 33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no 35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no

My intention is to create a small function which I can use for other columns as well hence I tried to create a function using "dfply" package.

import pandas as pd
import dfply
from dfply import *

#creating the function

@dfpipe
def woe_iv(df,variable):
    step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
    return step1

#invoking the function

banks>>woe_iv(X.job)

However, this piece of code is giving me an error stating below:

@dfpipe

def woe_iv(df,variable):
            
            step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
            return step1
banks>>woe_iv(X.job)
Traceback (most recent call last):

  File "<ipython-input-46-d851aeac1927>", line 7, in <module>
    banks>>woe_iv(X.job)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
    result = self.function(other_copy)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
    return pipe(lambda x: self.function(x, *args, **kwargs))

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 329, in __call__
    return self.function(*args, **kwargs)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 282, in __call__
    return self.function(df, *args, **kwargs)

  File "<ipython-input-46-d851aeac1927>", line 5, in woe_iv
    step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
    result = self.function(other_copy)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
    return pipe(lambda x: self.function(x, *args, **kwargs))

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 279, in __call__
    args = self._recursive_arg_eval(df, args[1:])

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 241, in _recursive_arg_eval
    return [

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 242, in <listcomp>
    self._symbolic_to_label(df, a) if i in eval_as_label

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 231, in _symbolic_to_label
    return self._evaluator_loop(df, arg, self._evaluate_label)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 225, in _evaluator_loop
    return eval_func(df, arg)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 181, in _evaluate_label
    arg = self._evaluate(df, arg)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 175, in _evaluate
    arg = arg.evaluate(df)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 71, in evaluate
    return self.function(context)

  File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 74, in <lambda>
    return Intention(lambda x: getattr(self.function(x), attribute),

  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'variable'

Let me know if I am missing on something.

  • Hi, can you please edit and clarify the code with, for example, correct indentation, delete blank lines for the code that belonging to the function `def woe_iv (..` and a new line between other calls. Thanks! – IODEV Apr 19 '21 at 07:46
  • Hi,thanks for reverting. I have updated the code part. Kindly let me know if it helps now.thanks – Shameek Mukherjee Apr 19 '21 at 08:14
  • Btw, it wasn't me that edited your post initially. Probably some other stackoverflow question reviewer. – IODEV Apr 19 '21 at 08:35

1 Answers1

0

Thanks for the sample data. The root cause of the problem was that you forgot to put brackets around the variable in woe_iv() (ie X[variable]) which caused the error "AttributeError:" DataFrame "object has no" variable "attribute" "

@dfpipe
def woe_iv(df, variable):
    return df >> group_by(X[variable]) >> summarize(COUNT=X[variable].count())

banks = pd.read_excel('banks.xlsx')

>> print(banks >> woe_iv('marital'))

   marital  COUNT
0  married      2
1   single      1

If you don't like panda pipes there is an alternative form:

>> banks.groupby(['marital']).size().reset_index(name='COUNT')

   marital  COUNT
0  married      2
1   single      1

or if you are familiar with SQL use PandaSQL:

SQL_Query = pd.read_sql_query(
'''select product_name, product_price_per_unit, units_ordered,
   ((units_ordered) * (product_price_per_unit)) AS revenue
   from tracking_sales''', conn)

Sample data:

>> print(banks)

   age         job  marital  education default  balance housing loan  \
0   30  unemployed  married    primary      no     1787      no   no   
1   33    services  married  secondary      no     4789     yes  yes   
2   35  management   single   tertiary      no     1350     yes   no   

    contact  day month  duration  campaign  pdays  previous poutcome   y
0  cellular   19   oct        79         1     -1         0  unknown  no  
1  cellular   11   may       220         1    339         4  failure  no  
2  cellular   16   apr       185         1    330         1  failure  no
IODEV
  • 1,706
  • 2
  • 17
  • 20