Importing custom modules to AWS EMR

Asked Aug 21 '22 at 14:26

Active Aug 21 '22 at 14:26

Viewed 622 times

I have a s3 repository containing a 'main.py' file, with custom modules that I built (inside 'Cache' and 'Helpers'):

My 'main.py' file looks like that:

from pyspark.sql import SparkSession
from spark_main import spark_process
from Cache import redis_main

file_root = 'flexible_dates_emr\parquet_examples\pe_1.parquet'
city_pairs = [('TLV', 'NYC', 'NYC', 'TLV'), ('TLV', 'ROM', 'ROM', 'TLV')]

def main():
    spark = SparkSession.builder.appName('Test').getOrCreate()
    spark_data = spark_process(spark, file_root, city_pairs)
    redis_main.redis_update_from_older_file(spark_data)
    print(spark_data)

if __name__ == '__main__':
    main()

I have an EMR cluster with all the requirements of the project that's working well, but when I try to import a module such as 'spark_process' or 'redis_main' my task fails.

I guess the reason is because it doesn't recognize the files that the modules are in.

My task is:

How can I use the modules? thanks.

asked Aug 21 '22 at 14:26

Daniel Avigdor

just copy the folders to the cluster's hdfs and ship it to executors. – samkart Aug 22 '22 at 08:20
how do I ship it? – Daniel Avigdor Aug 22 '22 at 08:34
see [this](https://stackoverflow.com/q/24686474/8279585) – samkart Aug 22 '22 at 08:37

Importing custom modules to AWS EMR

0 Answers0