2

My question is I generated a function to store the 10-fold cross-validation scores for each of the stepwise models within each classifier. For example, for Naive Bayes, I have two models, one only use one variable but others use two. The similar to decision tree model. The function is something like

def crossV(clf):
    cvOutcome=pd.DataFrame()
    index=pd.DataFrame()
    classifier=pd.DataFrame()
    for i in range(4)[2:]:
        tt=array(tuple(x[1:i] for x in modelDataFullnew))
        qq=array(tuple(x[0] for x in modelDataFullnew))
        scores=cross_validation.cross_val_score(clf, tt, qq, cv=10)*100
        index_i=list(np.repeat(i-1,10))
        classifier_i=list(np.repeat(str(clf)[:-2],10))
        scores=list(scores)
        cvOutcome=cvOutcome.append(scores)
        index=index.append(index_i)
        classifier=classifier.append(classifier_i)
    merge=pd.concat([index,cvOutcome,classifier],axis=1)
    merge.columns=['model','rate','classifier']
    return(merge)

from sklearn.naive_bayes import GaussianNB as gnb
clf_nb=gnb()
from sklearn import tree
clf_dt=tree.DecisionTreeClassifier()

If I do crossV(clf_nb) it will give me the result as

    model   rate    classifier
   1     92.558679   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB

My question is how can I apply this function to several classifiers and append their result as a long data frame like

    model   rate    classifier
   1     92.558679   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     92.558381   GaussianNB
   1     93.25       DecisionTree
   1     93.25       DecisionTree

i tried this code but it does not work:

hhh=[clf_nb,clf_dt]

g=pd.DataFrame()
while i in hhh:
    g=g.append(crossV(i))

I also tried map function in array like

map(crossV,(clf_nb,clf_dt)) 

It works but just give me a larger list and I don't know how to transform it to data frame.

MYjx
  • 4,157
  • 9
  • 38
  • 53
  • possible duplicate of [add one row in a pandas.DataFrame](http://stackoverflow.com/questions/10715965/add-one-row-in-a-pandas-dataframe) – stachyra Jul 08 '14 at 03:54
  • I tried but the result is nothing I don't know what goes wrong here – MYjx Jul 08 '14 at 04:01
  • Did you try `df = pd.concat( (crossV(clf_nb), crossV(clf_dt)) )` – furas Jul 08 '14 at 04:05
  • this code worked but what if I have twenty classifiers? I want to write a general function to do that but it somehow did not work for me... – MYjx Jul 08 '14 at 04:17

1 Answers1

1
clf = [clf_nb, clf_dt]

cross_clf = [ crossV(x) for x in clf ]

df = pd.concat( cross_clf )

EDIT:

Example to your question in comment:

I need i = clf_nb or i = clf_nb to starts while

hhh = [clf_nb, clf_dt]

g = pd.DataFrame()

i = clf_nb

while i in hhh: # if `clf_nb` is still on the list `hhh` then ...
    g.append( crossV(i) ) # append `clf_nb` to the `g`

but i is all the time equils clf_nb and clf_nb is all the time on list hhh so you have endless loop which always add clf_nb to g

furas
  • 134,197
  • 12
  • 106
  • 148
  • thanks! It works!!! But could you please point out why the while loop does not work here? Thanks again! – MYjx Jul 08 '14 at 05:08
  • `while i in hhh` means: if value from `i` exists on list `hhh` than repeat. It is like `if i in hhh`. – furas Jul 08 '14 at 05:13
  • Thanks! But I am still a little bit confused while `i` is in `hhh` why it does not do the concat or append as your code does? Thanks again!:)@furas – MYjx Jul 08 '14 at 13:59
  • First you have no variable `i`, second `i` would have to be `i = clf_nb` or `i = clf_dt`, third `in` in `for` works in different way than in `while` and `if`. `in` in `while` only check "if 'i' is still on list 'hhh' then ...", `in` in `for` do `get next element from 'hhh' and assign it to 'i' then ..." – furas Jul 08 '14 at 16:20
  • I add some example to answer. – furas Jul 08 '14 at 16:36
  • Thank you so much! It is much clear now but actually it is not endless loop but return a null result @furas – MYjx Jul 09 '14 at 00:21
  • Because your `i` is not equile to `clf_nb` or `clf_dt` so `while i in hhh` gives `False` and `g.append()` is not executed. Try `print( i in hhh )`. – furas Jul 09 '14 at 00:32