I am working on a pyspark project below is my project directory structure.
project_dir/
src/
etl/
__init__.py
etl_1.py
spark.py
config/
__init__.py
utils/
__init__.py
test/
test_etl_1.py
setup.py
README.md
requirements.txt
When I run below unit test code I get
python test_etl_1.py
Traceback (most recent call last):
File "test_etl_1.py", line 1, in <module>
from src.etl.spark import get_spark
ImportError: No module named src.etl.spark
This is my unit test file:
from src.etl.spark import get_spark
from src.etl.addcol import with_status
class TestAppendCol(object):
def test_with_status(self):
source_data = [
("p", "w", "pw@sample.com"),
("j", "b", "jb@sample.com")
]
source_df = get_spark().createDataFrame(
source_data,
["first_name", "last_name", "email"]
)
actual_df = with_status(source_df)
expected_data = [
("p", "w", "pw@sample.com", "added"),
("j", "b", "jb@sample.com", "added")
]
expected_df = get_spark().createDataFrame(
expected_data,
["first_name", "last_name", "email", "status"]
)
assert(expected_df.collect() == actual_df.collect())
I need to run this file as pytest but it is not working due to Module error. Can you please help me on this error.