0

I am working on a pyspark project below is my project directory structure.

project_dir/
src/
    etl/
       __init__.py
       etl_1.py
       spark.py
    config/
       __init__.py
    utils/
       __init__.py
test/
    test_etl_1.py
setup.py
README.md
requirements.txt

When I run below unit test code I get

python test_etl_1.py

Traceback (most recent call last):
  File "test_etl_1.py", line 1, in <module>
    from src.etl.spark import get_spark
ImportError: No module named src.etl.spark

This is my unit test file:

from src.etl.spark import get_spark
from src.etl.addcol import with_status

class TestAppendCol(object):

  def test_with_status(self):

    source_data = [
        ("p", "w", "pw@sample.com"),
        ("j", "b", "jb@sample.com")
    ]
    source_df = get_spark().createDataFrame(
        source_data,
        ["first_name", "last_name", "email"]
    )

    actual_df = with_status(source_df)

    expected_data = [
        ("p", "w", "pw@sample.com", "added"),
        ("j", "b", "jb@sample.com", "added")
    ]
    expected_df = get_spark().createDataFrame(
        expected_data,
        ["first_name", "last_name", "email", "status"]
    )

    assert(expected_df.collect() == actual_df.collect())

I need to run this file as pytest but it is not working due to Module error. Can you please help me on this error.

Chiel
  • 1,865
  • 1
  • 11
  • 24
nilesh1212
  • 1,561
  • 2
  • 26
  • 60
  • Does this answer your question? [Using pytest with a src layer](https://stackoverflow.com/questions/50155464/using-pytest-with-a-src-layer) – hoefling Jun 04 '20 at 11:18

2 Answers2

0

Your source code is in src, the modules are etl, config and util. So update imports like below.

from etl.spark import get_spark
from etl.addcol import with_status

Make sure PYTHONPATH points to project_dir/src directory

Ranga Vure
  • 1,922
  • 3
  • 16
  • 23
  • Hi Ranga, I tried your solution.. I am getting below error Traceback (most recent call last): File "test-etl_1.py", line 1, in from etl.spark import get_spark ImportError: No module named etl.spark – nilesh1212 Jun 03 '20 at 17:11
  • I am running python test_etl_1.py – nilesh1212 Jun 03 '20 at 17:12
0

Your PYTHONPATH depends on where you are navigated. Given that you say that you run python test_etl_1.py, you must be in ~/project_dir/test/. Therefore, it can't find src.

If you would run python -m unittest from ~/project_dir/ it should work. If not, you can always try to fix/improve the installation of your package like shown here.

Chiel
  • 1,865
  • 1
  • 11
  • 24