7

I'm trying to run a script that requires the datasets python package. I've tried installing this unsuccessfully using pip by calling:

pip install datasets

I know this hasn't worked because when I run the script I get the message:

Traceback (most recent call last):
  File "lda.py", line 2, in <module>
    import lda
  File "/Users/deepthought/lda.py", line 3, in <module>
    import datasets
ImportError: No module named datasets

I've installed python via homebrew.

When I run pip install datasets I get the error:

Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/ch/84cpkwc52zx0rsh4k5v4_7h40000gn/T/pip-build-gZWyT3/datasets/

I'm fairly new to scripting python or going under the hood of an OS X, so there's a risk I've missed something elementary.

I've been researching & trying to overcome this for about a week now including looking at similar questions on stackoverflow.com and haven't gotten past this stage for the duration. One of the tutorials I was working through told me to edit ~/.profile

This has been left like so:

# The orginal version is saved in .profile.pysave
#PATH="/Library/Frameworks/Python.framework/Versions/3.5/bin:${PATH}"
#export PATH
export PATH=/usr/local/bin:/usr/local/sbin:$PATH

/etc/paths contains:

/usr/local/bin
/usr/bin
/bin
/usr/sbin
/sbin

I'm running OS X El Capitan - 10.11.5 (15F34) Python 2.7.11

Brew doctor flagged multiple items, but I've no idea whether it is worth fixing none/all of them:

Warning: Your XQuartz (2.7.7) is outdated
Please install XQuartz 2.7.9:
  https://xquartz.macosforge.org

Warning: Python is installed at /Library/Frameworks/Python.framework

Homebrew only supports building against the System-provided Python or a
brewed Python. In particular, Pythons installed to /Library can interfere
with other software installs.

Warning: Unbrewed dylibs were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected dylibs:
    /usr/local/lib/libtcl8.6.dylib
    /usr/local/lib/libtk8.6.dylib

Warning: Unbrewed header files were found in /usr/local/include.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected header files:
    /usr/local/include/fakemysql.h
    /usr/local/include/fakepq.h
    /usr/local/include/fakesql.h
    /usr/local/include/itcl.h
    /usr/local/include/itcl2TclOO.h
    /usr/local/include/itclDecls.h
    /usr/local/include/itclInt.h
    /usr/local/include/itclIntDecls.h
    /usr/local/include/itclMigrate2TclCore.h
    /usr/local/include/itclTclIntStubsFcn.h
    /usr/local/include/mysqlStubs.h
    /usr/local/include/node/ares.h
    /usr/local/include/node/ares_version.h
    /usr/local/include/node/nameser.h
    /usr/local/include/node/node.h
    /usr/local/include/node/node_buffer.h
    /usr/local/include/node/node_internals.h
    /usr/local/include/node/node_object_wrap.h
    /usr/local/include/node/node_version.h
    /usr/local/include/node/openssl/opensslconf.h
    /usr/local/include/node/uv-private/ngx-queue.h
    /usr/local/include/node/uv-private/stdint-msvc2008.h
    /usr/local/include/node/uv-private/tree.h
    /usr/local/include/node/uv-private/uv-bsd.h
    /usr/local/include/node/uv-private/uv-darwin.h
    /usr/local/include/node/uv-private/uv-linux.h
    /usr/local/include/node/uv-private/uv-sunos.h
    /usr/local/include/node/uv-private/uv-unix.h
    /usr/local/include/node/uv-private/uv-win.h
    /usr/local/include/node/uv.h
    /usr/local/include/node/v8-debug.h
    /usr/local/include/node/v8-preparser.h
    /usr/local/include/node/v8-profiler.h
    /usr/local/include/node/v8-testing.h
    /usr/local/include/node/v8.h
    /usr/local/include/node/v8stdint.h
    /usr/local/include/node/zconf.h
    /usr/local/include/node/zlib.h
    /usr/local/include/odbcStubs.h
    /usr/local/include/pqStubs.h
    /usr/local/include/tcl.h
    /usr/local/include/tclDecls.h
    /usr/local/include/tclOO.h
    /usr/local/include/tclOODecls.h
    /usr/local/include/tclPlatDecls.h
    /usr/local/include/tclThread.h
    /usr/local/include/tclTomMath.h
    /usr/local/include/tclTomMathDecls.h
    /usr/local/include/tdbc.h
    /usr/local/include/tdbcDecls.h
    /usr/local/include/tdbcInt.h
    /usr/local/include/tk.h
    /usr/local/include/tkDecls.h
    /usr/local/include/tkPlatDecls.h

Warning: Unbrewed .pc files were found in /usr/local/lib/pkgconfig.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected .pc files:
    /usr/local/lib/pkgconfig/tcl.pc
    /usr/local/lib/pkgconfig/tk.pc

Warning: Unbrewed static libraries were found in /usr/local/lib.
If you didn't put them there on purpose they could cause problems when
building Homebrew formulae, and may need to be deleted.

Unexpected static libraries:
    /usr/local/lib/libtclstub8.6.a
    /usr/local/lib/libtkstub8.6.a

Warning: You have unlinked kegs in your Cellar
Leaving kegs unlinked can lead to build-trouble and cause brews that depend on
those kegs to fail to run properly once built. Run `brew link` on these:
    git
    python3

Warning: Broken symlinks were found. Remove them with `brew prune`:
    /usr/local/bin/github
    /usr/local/lib/perl5/site_perl/Git/I18N.pm
    /usr/local/lib/perl5/site_perl/Git/IndexInfo.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Editor.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Fetcher.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/GlobSpec.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Log.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Memoize/YAML.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Migration.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Prompt.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Ra.pm
    /usr/local/lib/perl5/site_perl/Git/SVN/Utils.pm
    /usr/local/lib/perl5/site_perl/Git/SVN.pm
    /usr/local/lib/perl5/site_perl/Git.pm
    /usr/local/share/git-core/templates/description
    /usr/local/share/git-core/templates/hooks/applypatch-msg.sample
    /usr/local/share/git-core/templates/hooks/commit-msg.sample
    /usr/local/share/git-core/templates/hooks/post-update.sample
    /usr/local/share/git-core/templates/hooks/pre-applypatch.sample
    /usr/local/share/git-core/templates/hooks/pre-commit.sample
    /usr/local/share/git-core/templates/hooks/pre-push.sample
    /usr/local/share/git-core/templates/hooks/pre-rebase.sample
    /usr/local/share/git-core/templates/hooks/prepare-commit-msg.sample
    /usr/local/share/git-core/templates/hooks/update.sample
    /usr/local/share/git-core/templates/info/exclude
    /usr/local/share/man/man1/git-add.1
    /usr/local/share/man/man1/git-am.1
    /usr/local/share/man/man1/git-annotate.1
    /usr/local/share/man/man1/git-apply.1
    /usr/local/share/man/man1/git-archimport.1
    /usr/local/share/man/man1/git-archive.1
    /usr/local/share/man/man1/git-bisect.1
    /usr/local/share/man/man1/git-blame.1
    /usr/local/share/man/man1/git-branch.1
    /usr/local/share/man/man1/git-bundle.1
    /usr/local/share/man/man1/git-cat-file.1
    /usr/local/share/man/man1/git-check-attr.1
    /usr/local/share/man/man1/git-check-ignore.1
    /usr/local/share/man/man1/git-check-mailmap.1
    /usr/local/share/man/man1/git-check-ref-format.1
    /usr/local/share/man/man1/git-checkout-index.1
    /usr/local/share/man/man1/git-checkout.1
    /usr/local/share/man/man1/git-cherry-pick.1
    /usr/local/share/man/man1/git-cherry.1
    /usr/local/share/man/man1/git-citool.1
    /usr/local/share/man/man1/git-clean.1
    /usr/local/share/man/man1/git-clone.1
    /usr/local/share/man/man1/git-column.1
    /usr/local/share/man/man1/git-commit-tree.1
    /usr/local/share/man/man1/git-commit.1
    /usr/local/share/man/man1/git-config.1
    /usr/local/share/man/man1/git-count-objects.1
    /usr/local/share/man/man1/git-credential-cache--daemon.1
    /usr/local/share/man/man1/git-credential-cache.1
    /usr/local/share/man/man1/git-credential-store.1
    /usr/local/share/man/man1/git-credential.1
    /usr/local/share/man/man1/git-cvsexportcommit.1
    /usr/local/share/man/man1/git-cvsimport.1
    /usr/local/share/man/man1/git-cvsserver.1
    /usr/local/share/man/man1/git-daemon.1
    /usr/local/share/man/man1/git-describe.1
    /usr/local/share/man/man1/git-diff-files.1
    /usr/local/share/man/man1/git-diff-index.1
    /usr/local/share/man/man1/git-diff-tree.1
    /usr/local/share/man/man1/git-diff.1
    /usr/local/share/man/man1/git-difftool.1
    /usr/local/share/man/man1/git-fast-export.1
    /usr/local/share/man/man1/git-fast-import.1
    /usr/local/share/man/man1/git-fetch-pack.1
    /usr/local/share/man/man1/git-fetch.1
    /usr/local/share/man/man1/git-filter-branch.1
    /usr/local/share/man/man1/git-fmt-merge-msg.1
    /usr/local/share/man/man1/git-for-each-ref.1
    /usr/local/share/man/man1/git-format-patch.1
    /usr/local/share/man/man1/git-fsck-objects.1
    /usr/local/share/man/man1/git-fsck.1
    /usr/local/share/man/man1/git-gc.1
    /usr/local/share/man/man1/git-get-tar-commit-id.1
    /usr/local/share/man/man1/git-grep.1
    /usr/local/share/man/man1/git-gui.1
    /usr/local/share/man/man1/git-hash-object.1
    /usr/local/share/man/man1/git-help.1
    /usr/local/share/man/man1/git-http-backend.1
    /usr/local/share/man/man1/git-http-fetch.1
    /usr/local/share/man/man1/git-http-push.1
    /usr/local/share/man/man1/git-imap-send.1
    /usr/local/share/man/man1/git-index-pack.1
    /usr/local/share/man/man1/git-init-db.1
    /usr/local/share/man/man1/git-init.1
    /usr/local/share/man/man1/git-instaweb.1
    /usr/local/share/man/man1/git-log.1
    /usr/local/share/man/man1/git-lost-found.1
    /usr/local/share/man/man1/git-ls-files.1
    /usr/local/share/man/man1/git-ls-remote.1
    /usr/local/share/man/man1/git-ls-tree.1
    /usr/local/share/man/man1/git-mailinfo.1
    /usr/local/share/man/man1/git-mailsplit.1
    /usr/local/share/man/man1/git-merge-base.1
    /usr/local/share/man/man1/git-merge-file.1
    /usr/local/share/man/man1/git-merge-index.1
    /usr/local/share/man/man1/git-merge-one-file.1
    /usr/local/share/man/man1/git-merge-tree.1
    /usr/local/share/man/man1/git-merge.1
    /usr/local/share/man/man1/git-mergetool--lib.1
    /usr/local/share/man/man1/git-mergetool.1
    /usr/local/share/man/man1/git-mktag.1
    /usr/local/share/man/man1/git-mktree.1
    /usr/local/share/man/man1/git-mv.1
    /usr/local/share/man/man1/git-name-rev.1
    /usr/local/share/man/man1/git-notes.1
    /usr/local/share/man/man1/git-p4.1
    /usr/local/share/man/man1/git-pack-objects.1
    /usr/local/share/man/man1/git-pack-redundant.1
    /usr/local/share/man/man1/git-pack-refs.1
    /usr/local/share/man/man1/git-parse-remote.1
    /usr/local/share/man/man1/git-patch-id.1
    /usr/local/share/man/man1/git-peek-remote.1
    /usr/local/share/man/man1/git-prune-packed.1
    /usr/local/share/man/man1/git-prune.1
    /usr/local/share/man/man1/git-pull.1
    /usr/local/share/man/man1/git-push.1
    /usr/local/share/man/man1/git-quiltimport.1
    /usr/local/share/man/man1/git-read-tree.1
    /usr/local/share/man/man1/git-rebase.1
    /usr/local/share/man/man1/git-receive-pack.1
    /usr/local/share/man/man1/git-reflog.1
    /usr/local/share/man/man1/git-relink.1
    /usr/local/share/man/man1/git-remote-ext.1
    /usr/local/share/man/man1/git-remote-fd.1
    /usr/local/share/man/man1/git-remote-testgit.1
    /usr/local/share/man/man1/git-remote.1
    /usr/local/share/man/man1/git-repack.1
    /usr/local/share/man/man1/git-replace.1
    /usr/local/share/man/man1/git-repo-config.1
    /usr/local/share/man/man1/git-request-pull.1
    /usr/local/share/man/man1/git-rerere.1
    /usr/local/share/man/man1/git-reset.1
    /usr/local/share/man/man1/git-rev-list.1
    /usr/local/share/man/man1/git-rev-parse.1
    /usr/local/share/man/man1/git-revert.1
    /usr/local/share/man/man1/git-rm.1
    /usr/local/share/man/man1/git-send-email.1
    /usr/local/share/man/man1/git-send-pack.1
    /usr/local/share/man/man1/git-sh-i18n--envsubst.1
    /usr/local/share/man/man1/git-sh-i18n.1
    /usr/local/share/man/man1/git-sh-setup.1
    /usr/local/share/man/man1/git-shell.1
    /usr/local/share/man/man1/git-shortlog.1
    /usr/local/share/man/man1/git-show-branch.1
    /usr/local/share/man/man1/git-show-index.1
    /usr/local/share/man/man1/git-show-ref.1
    /usr/local/share/man/man1/git-show.1
    /usr/local/share/man/man1/git-stage.1
    /usr/local/share/man/man1/git-stash.1
    /usr/local/share/man/man1/git-status.1
    /usr/local/share/man/man1/git-stripspace.1
    /usr/local/share/man/man1/git-submodule.1
    /usr/local/share/man/man1/git-svn.1
    /usr/local/share/man/man1/git-symbolic-ref.1
    /usr/local/share/man/man1/git-tag.1
    /usr/local/share/man/man1/git-tar-tree.1
    /usr/local/share/man/man1/git-unpack-file.1
    /usr/local/share/man/man1/git-unpack-objects.1
    /usr/local/share/man/man1/git-update-index.1
    /usr/local/share/man/man1/git-update-ref.1
    /usr/local/share/man/man1/git-update-server-info.1
    /usr/local/share/man/man1/git-upload-archive.1
    /usr/local/share/man/man1/git-upload-pack.1
    /usr/local/share/man/man1/git-var.1
    /usr/local/share/man/man1/git-verify-pack.1
    /usr/local/share/man/man1/git-verify-tag.1
    /usr/local/share/man/man1/git-web--browse.1
    /usr/local/share/man/man1/git-whatchanged.1
    /usr/local/share/man/man1/git-write-tree.1
    /usr/local/share/man/man1/git.1
    /usr/local/share/man/man1/gitk.1
    /usr/local/share/man/man1/gitremote-helpers.1
    /usr/local/share/man/man1/gitweb.1
    /usr/local/share/man/man3/Git.3pm
    /usr/local/share/man/man3/Git::I18N.3pm
    /usr/local/share/man/man3/Git::SVN::Editor.3pm
    /usr/local/share/man/man3/Git::SVN::Fetcher.3pm
    /usr/local/share/man/man3/Git::SVN::Memoize::YAML.3pm
    /usr/local/share/man/man3/Git::SVN::Prompt.3pm
    /usr/local/share/man/man3/Git::SVN::Ra.3pm
    /usr/local/share/man/man3/Git::SVN::Utils.3pm
    /usr/local/share/man/man5/gitattributes.5
    /usr/local/share/man/man5/githooks.5
    /usr/local/share/man/man5/gitignore.5
    /usr/local/share/man/man5/gitmodules.5
    /usr/local/share/man/man5/gitrepository-layout.5
    /usr/local/share/man/man5/gitweb.conf.5
    /usr/local/share/man/man7/gitcli.7
    /usr/local/share/man/man7/gitcore-tutorial.7
    /usr/local/share/man/man7/gitcredentials.7
    /usr/local/share/man/man7/gitcvs-migration.7
    /usr/local/share/man/man7/gitdiffcore.7
    /usr/local/share/man/man7/gitglossary.7
    /usr/local/share/man/man7/gitnamespaces.7
    /usr/local/share/man/man7/gitrevisions.7
    /usr/local/share/man/man7/gittutorial-2.7
    /usr/local/share/man/man7/gittutorial.7
    /usr/local/share/man/man7/gitworkflows.7

Warning: Your Homebrew is outdated.
You haven't updated for at least 24 hours. This is a long time in brewland!
To update Homebrew, run `brew update`.

How do I make progress in diagnosing the issue with the installation of the datasets package?

Update

Here is the script I'm trying to run:

import sys
egg_path = '/usr/local/lib/python2.7/site-packages/datasets-0.0.9-py2.7.egg'
sys.path.append(egg_path)

import numpy as np
import lda
import datasets

X = lda.datasets.load_reuters()
vocab = lda.datasets.load_reuters_vocab()
titles = lda.datasets.load_reuters_titles()
X.shape
(395, 4258)
X.sum()
84010
model = lda.LDA(n_topics=20, n_iter=1500, random_state=1)
model.fit(X)  # model.fit_transform(X) is also available
topic_word = model.topic_word_  # model.components_ also works
n_top_words = 8
for i, topic_dist in enumerate(topic_word):
     topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
     print('Topic {}: {}'.format(i, ' '.join(topic_words)))
goose
  • 2,502
  • 6
  • 42
  • 69

2 Answers2

4

Using pip install datasets I was also not able to properly install this package. It seems like there is a bug in this particular package.

The DESCRIBE.rst file is simply missing. To fix this just download the plain package from PyPi. https://pypi.python.org/pypi/datasets/0.0.9

Then adjust the setup.py file (remove the description).

Afterwards you need to install using python setup.py install. Don't forget to add the installed package to your Python path!

To do so, I would recommend that you add the following to your script.

import sys
egg_path = '__MODULE_PATH__/datasets-0.0.9-py3.5.egg'
sys.path.append(egg_path)
import datasets

Otherwise, you can also add your module using:

export PATH=__MODULE_PATH__:$PATH

Alternatively, you could also simply pull the source code from the Github repository and just include it in your project. https://github.com/realtimeweb/datasets

Hope this was kind of helpful to your problem. If you got any further questions just let me know.

Philipp Braun
  • 1,583
  • 2
  • 25
  • 41
  • Thanks Philipp. This seems very promising, I completed the install following your steps with no errors. I'm currently trying to work out how to add the package to my python path (this type of thing is also a bit new to me). If this is trivial, please feel free to point out the best way to do so. – goose Jul 03 '16 at 07:31
  • I added a couple of lines on how you usually add a module to your Python Path. Hope it helps. – Philipp Braun Jul 03 '16 at 15:36
  • Thanks for that Philipp - I can't see what to do with what you've added unfortunately. My guess was to add the egg_path... etc. block to the script I'm trying to run, but this gives the error "sys.path.append(egg_path) NameError: name 'sys' is not defined". My guess was to run the export line you gave before running the script, but this still gives the "No module named datasets" message. Sorry for my complete ignorance on this. I seem to have a lot to learn on this. – goose Jul 03 '16 at 21:22
  • You need to `import sys` as well. But then you should be good. And yes the code is indeed supposed to go into your script. In terms of the Python path, I think that there might be a permission issue. That's why I included the script, which usually works. – Philipp Braun Jul 03 '16 at 22:07
  • http://stackoverflow.com/questions/14295680/cannot-import-a-python-module-that-is-definitely-installed-mechanize might help you out if you wish to use the path option instead. – Philipp Braun Jul 03 '16 at 22:14
  • Thanks for your patience Philipp. Am I literally meant to write '__MODULE_PATH__/datasets-0.0.9-py3.5.egg' or am I supposed to work out what the '__MODULE_PATH__... should be? Running it as you've written it gives me the same error and replacing it is throwing up the question of what to replace it with - a path to where my packages are installed? – goose Jul 04 '16 at 22:33
  • No you need to find the path where your datasets module is located. Hope it works and please accept my answer. – Philipp Braun Jul 05 '16 at 01:55
  • 2
    When I run the setup file it says the following at the end: "Installed /usr/local/lib/python2.7/site-packages/datasets-0.0.9-py2.7.egg" - but when I add this path to the egg_path variable in your example it still doesn't work (scratches head). – goose Jul 05 '16 at 20:38
  • If nothing works you should really just try to download the module folder from github and include it in your project like I suggested above. I am sure one of the solutions I provided above will work for you. Getting stuff to work in Python can sometimes be a bit tricky ;) Wish you all the best. – Philipp Braun Jul 05 '16 at 23:45
  • just as goose experienced, pip install completed, but still can't import datasets. If I download the model folder from github, how to include in project? I just add an __init__.py under datasets folder, import is fine, but fail at datasets.get method. – FrankZhu Oct 02 '18 at 02:10
3

I just hit the same issue on a rapsberry pi, just found out this had been fixed but the error comes from the lack of ram to extract properly the package.

You can fix this by disabling the creation of a cache dir in ram adding the parameter

--no-cache-dir

for example

pip2 install --user --no-cache-dir datasets
Lesto
  • 2,260
  • 2
  • 19
  • 26
  • The package is called datasets not dataset – eggie5 Sep 04 '17 at 14:44
  • the above command has the following error output: Can not perform a '--user' install. User site-packages are not visible in this virtualenv. – FrankZhu Oct 02 '18 at 10:34
  • the error speak by itself, you can-t use the flag --user if you are in a virtual env, probably as you are already in an virtual environment where all is encapsulated for your application, and does not require special permission t install "system" packages. – Lesto Oct 02 '18 at 17:33
  • The pip cache directory is on your hard drive, not in the RAM... https://pip.pypa.io/en/stable/reference/pip_install/#caching – mimo Apr 19 '20 at 21:35
  • you are right, still something is going on in the caching system that cause RAM abuse, see https://github.com/pypa/pip/issues/2984 Probably the download is just temporary saved in RAM before get dump to disk or something similar – Lesto Apr 20 '20 at 22:41