I have a dataset "banks" where if I do a groupby on a column name "jobs" to check counts in each category,I could find the following:
index | jobs | count |
---|---|---|
0 | adnin. | 478 |
1 | blue-collar | 946 |
2 | entrepreneur | 168 |
3 | housemaid | 112 |
4 | management | 969 |
5 | retired | 230 |
6 | self-employed | 183 |
7 | services | 417 |
8 | student | 84 |
9 | technician. | 768 |
I've also added first 3 lines of the dataset I am using: age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y 30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no 33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no 35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
My intention is to create a small function which I can use for other columns as well hence I tried to create a function using "dfply" package.
import pandas as pd
import dfply
from dfply import *
#creating the function
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
#invoking the function
banks>>woe_iv(X.job)
However, this piece of code is giving me an error stating below:
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
Traceback (most recent call last):
File "<ipython-input-46-d851aeac1927>", line 7, in <module>
banks>>woe_iv(X.job)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 329, in __call__
return self.function(*args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 282, in __call__
return self.function(df, *args, **kwargs)
File "<ipython-input-46-d851aeac1927>", line 5, in woe_iv
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 279, in __call__
args = self._recursive_arg_eval(df, args[1:])
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 241, in _recursive_arg_eval
return [
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 242, in <listcomp>
self._symbolic_to_label(df, a) if i in eval_as_label
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 231, in _symbolic_to_label
return self._evaluator_loop(df, arg, self._evaluate_label)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 225, in _evaluator_loop
return eval_func(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 181, in _evaluate_label
arg = self._evaluate(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 175, in _evaluate
arg = arg.evaluate(df)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 71, in evaluate
return self.function(context)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 74, in <lambda>
return Intention(lambda x: getattr(self.function(x), attribute),
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'variable'
Let me know if I am missing on something.