I want to do parallel processing in for loop using pyspark.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('yarn').appName('myAppName').getOrCreate()
spark.conf.set("mapreduce.fileoutputcommitter.marksuccessfuljobs", "false")
data = [a,b,c]
for i in data:
try:
df = spark.read.parquet('gs://'+i+'-data')
df.createOrReplaceTempView("people")
df2=spark.sql("""select * from people """)
df.show()
except Exception as e:
print(e)
continue
Above mentioned script is working fine but i want to do parallel processing in pyspark and which is possible in scala