5

I'm training a python (2.7.11) classifier for text classification and while running I'm getting a deprecated warning message that I don't know which line in my code is causing it! The error/warning. However, the code works fine and give me the results...

\AppData\Local\Enthought\Canopy\User\lib\site-packages\sklearn\utils\validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

My code:

def main():
    data = []
    folds = 10
    ex = [ [] for x in range(0,10)]
    results = []
    for i,f in enumerate(sys.argv[1:]):
        data.append(csv.DictReader(open(f,'r'),delimiter='\t'))
    for f in data:       
        for i,datum in enumerate(f):
            ex[i % folds].append(datum)
    #print ex
    for held_out in range(0,folds):
        l = []
        cor = []
        l_test = []
        cor_test = []
        vec = []
        vec_test = []

        for i,fold in enumerate(ex):
            for line in fold:
                if i == held_out:
                    l_test.append(line['label'].rstrip("\n"))
                    cor_test.append(line['text'].rstrip("\n"))
                else:
                    l.append(line['label'].rstrip("\n"))
                    cor.append(line['text'].rstrip("\n"))

        vectorizer = CountVectorizer(ngram_range=(1,1),min_df=1)
        X = vectorizer.fit_transform(cor)
        for c in cor:        
            tmp = vectorizer.transform([c]).toarray()
            vec.append(tmp[0])
        for c in cor_test:        
            tmp = vectorizer.transform([c]).toarray()
            vec_test.append(tmp[0])

        clf = MultinomialNB()
        clf .fit(vec,l)
        result = accuracy(l_test,vec_test,clf)
        print result

if __name__ == "__main__":
    main()

Any idea which line raises this warning? Another issue is that running this code with different data sets gives me the same exact accuracy, and I can't figure out what causes this? If I want to use this model in another python process, I looked at the documentation and I found an example of using pickle library, but not for joblib. So, I tried following the same code, but this gave me errors:

clf = joblib.load('model.pkl') 
pred = clf.predict(vec);

Also, if my data is CSV file with this format: "label \t text \n" what should be in the label column in test data?

Thanks in advance

BPDESILVA
  • 2,040
  • 5
  • 15
  • 35
sareem
  • 429
  • 1
  • 8
  • 23

6 Answers6

19

Your 'vec' input into your clf.fit(vec,l).fit needs to be of type [[]], not just []. This is a quirk that I always forget when I fit models.

Just adding an extra set of square brackets should do the trick!

MrCorote
  • 565
  • 8
  • 21
Heavy Breathing
  • 567
  • 6
  • 20
14

It's:

pred = clf.predict(vec);

I used this in my code and it worked:

#This makes it into a 2d array
temp =  [2 ,70 ,90 ,1] #an instance
temp = np.array(temp).reshape((1, -1))
print(model.predict(temp))
6

2 solution: philosophy___make your data from 1D to 2D

  1. Just add: []

    vec = [vec]
    
  2. Reshape your data

    import numpy as np
    vec = np.array(vec).reshape(1, -1)
    
perror
  • 7,071
  • 16
  • 58
  • 85
5

If you want to find out where the Warning is coming from you can temporarly promote Warnings to Exceptions. This will give you a full Traceback and thus the lines where your program encountered the warning.

with warnings.catch_warnings():
    warnings.simplefilter("error")
    main()

If you run the program from the commandline you can also use the -W flag. More information on Warning-handling can be found in the python documentation.

I know it is only one part of your question I answered but did you debug your code?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • No I didn't debug since I'm using canopy on windows and it need upgrade to download the debug tool and I'm a beginner to all of this! Thanks for you answer – sareem Feb 03 '16 at 10:58
  • If you mean what's on the console, then I'm getting what I've pasted above. there is nothing more than that. If you mean tracing the code, then I didn't tried it but yes it seems I can import the pdb library, but using it didn't help much (maybe because I'm a beginner)... – sareem Feb 05 '16 at 23:20
0

Since 1D array would be deprecated. Try passing 2D array as a parameter. This might help.

clf = joblib.load('model.pkl') 
pred = clf.predict([vec]);
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
0

Predict method expects 2-d array , you can watch this video , i have also located the exact time https://youtu.be/KjJ7WzEL-es?t=2602 .You have to change from [] to [[]].

Shivprasad Koirala
  • 27,644
  • 7
  • 84
  • 73