0

I am trying to upload a simple python package to PypI. For testing this I first uploaded to test.pypi.org. When I install this package with pip and use it, I get the error FileNotFoundError: [Errno 2] File b'../data/spam_collection.csv' does not exist: b'../data/spam_collection.csv'. I implemented the following referring to other similar questions on StackOverflow and the documentation here and here. I tried with this a lot as you can see this is the version 11 of my package. What am I doing wrong here?

I use the package_data to upload the csv file.

setup.py

import setuptools
import string
import ast
import nltk
import pandas as pd
from nltk.corpus import stopwords
from nltk import sent_tokenize
from nltk import ngrams

with open("README.md", "r") as fh:
    long_description = fh.read()

setuptools.setup(
    name="spamclassifier",
    version="0.1.1",
    author="#####",
    author_email="###########",
    description="A bigram approach for classifying Spam and Ham messages",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="#############",
    packages=setuptools.find_packages(),
    include_package_data=True,
    package_data={'data': ['data/spam_collection.csv']},
    install_requires=["nltk", "pandas"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
)

My project structure

enter image description here

How I call this csv in the python file

 def classify(self):
     fullCorpus = pd.read_csv("../data/spam_collection.csv", sep="\t", header=None)
     fullCorpus.columns = ["lable", "body_text"]

MANIFEST.in

include README.md
include LICENSE
include data/spam_collection.csv

After installing the python package with pip I ran pip show -f spamclassifier to list the files in the package and the csv file is not listed. The output was,

Name: spamclassifier
Version: 0.1.3
Summary: A bigram approach for classifying Spam and Ham messages
Home-page: XXXXXXXX
Author: XXXXXXX
Author-email: XXXXXX 
License: UNKNOWN
Location: /home/kabilesh/PycharmProjects/TestPypl3/venv/lib/python3.6/site-packages
Requires: pandas, nltk
Required-by: 
Files:
  spamclassifier-0.1.3.dist-info/INSTALLER
  spamclassifier-0.1.3.dist-info/LICENSE
  spamclassifier-0.1.3.dist-info/METADATA
  spamclassifier-0.1.3.dist-info/RECORD
  spamclassifier-0.1.3.dist-info/WHEEL
  spamclassifier-0.1.3.dist-info/top_level.txt
  spamclassifier/SpamClassifier.py
  spamclassifier/__init__.py
  spamclassifier/__pycache__/SpamClassifier.cpython-36.pyc
  spamclassifier/__pycache__/__init__.cpython-36.pyc
Kabilesh
  • 1,000
  • 6
  • 22
  • 47

1 Answers1

0

Try this:

packages=['spamclassifier'],
package_dir={'spamclassifier': 'spamclassifier'},
package_data={'spamclassifier': ['data/*']},
include_package_data=True

Keep everything else constant. Hope it helps

Sagar Dawda
  • 1,126
  • 9
  • 17
  • I tried this. Still get the same error. Also tried this https://stackoverflow.com/questions/23252344/how-do-i-include-non-py-files-in-pypi, instead of TestPypI tried PypI no luck :( – Kabilesh Jun 12 '19 at 04:00
  • 1
    Ideally your data folder should be within `spamclassifier` directory. Can you move it there and check? – Sagar Dawda Jun 12 '19 at 07:11
  • Seems like package_data does not work for sdist but for bdist. I tried and bdist considers the csv file but not sdist. https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distribute – Kabilesh Jun 12 '19 at 08:58
  • by using "recursive-include spamclassifier *.csv" in MANIFEST.in I could upload the csv file. But then which path should I access to get this csv in my python file? In "fullCorpus = pd.read_csv("data/spam_collection.csv", sep="\t", header=None)" what should be the path? I have the file in spamclassifier/data – Kabilesh Jun 12 '19 at 09:15
  • Try `spamclassifier/data/spam_collection.csv`. That should work – Sagar Dawda Jun 12 '19 at 17:54