2

I am writing a python program in Google App Engine that calculates tf-idf using TfidfVectorizer in sklearn.

I have added sklearn library and have the import as:

from sklearn.feature_extraction.text import TfidfVectorizer

However it gives me no module named _check_build although it is in the library that I have imported.

Note: I have the same code in pure python and it works just fine so there is nothing wrong with the python syntax or imports; The problem starts with GAE.

Do you know any way to solve this issue?

Mina
  • 738
  • 1
  • 6
  • 26

2 Answers2

4

You can't. sklearn has a lot of 'c' based dependencies and typically any module that is named with a leading _ is a binary module.

So that's why you are getting a no module named _check_build error.

I seriously doubt you will get it to run even if you fake some of the 'c' libs unless they have pure python analogues.

I have done this in the past where libs had 'c' based performance versions as well as pure python.

Tim Hoffman
  • 12,976
  • 1
  • 17
  • 29
  • Thanks for the answer but do you know of a python library that I can use for tf-idf on Google App Engine ? – Mina Apr 19 '14 at 11:14
  • Sorry I have no idea what it does. You would be better off running this under GCE or EC2. When managed vm's are in general release then you can revisit appengine. – Tim Hoffman Apr 19 '14 at 11:24
0

if you are not using any of GAE-specific tools, try deploying your app on Heroku. It let's you deploy a whole virtual environment with all the installed libraries on it. Specifically, Scikit-learn works on Heroku just fine. Check this Github repo for example.

MostafaMV
  • 2,181
  • 3
  • 16
  • 22