2

I am trying to deploy PySpark locally using the instructions at

https://spark.apache.org/docs/latest/api/python/getting_started/install.html#using-pypi

I can see that extra dependencies are available, such as sql and pandas_on_spark that can be deployed with

pip install pyspark[sql,pandas_on_spark]

But how can we find all available extras?

Looking in the json of the pyspark package (based on https://wiki.python.org/moin/PyPIJSON)

https://pypi.org/pypi/pyspark/json

I could not find the possible extra dependencies (as described in What is 'extra' in pypi dependency?); the value for requires_dist is null.

Many thanks for your help.

karpan
  • 421
  • 1
  • 5
  • 13
  • As far as I know, you can not easily. If it is not documented, then you will have to look at the code/config for the packaging. In this case, here: https://github.com/apache/spark/blob/eb30a27e53158e64fffaa6d32ff9369ffbae0384/python/setup.py#L262-L274 -- `ml`, `mllib`, `sql`, `pandas_on_spark`. – sinoroc Mar 27 '22 at 11:34
  • If you already installed pyspark, you can use a workaround described [here](https://stackoverflow.com/a/63603540/6942134) to list its extras. – SergiyKolesnikov Aug 07 '23 at 16:32

1 Answers1

2

As far as I know, you can not easily get the list of extras. If this list is not clearly documented, then you will have to look at the code/config for the packaging. In this case, here which gives the following list: ml, mllib, sql, and pandas_on_spark.

sinoroc
  • 18,409
  • 2
  • 39
  • 70