In my code below I try to instantiate redis-py connection using env variable at URL. The problem is that when I use foreach or foreachPartition the env variable isn't recognized in #save_on_redis method.
I just try to create redis connection outside, but I receive "pickle.PicklingError: Can't pickle 'lock' object", because spark try to run these two methods, at the same time, on all nodes.
Question: How I can use env variables on the method passed as argument to foreach or foreachPartition ?
import os
from pyspark.sql import SparkSession
import redis
spark = (SparkSession
.builder
.getOrCreate())
print "---------"
print os.getenv("REDIS_REPORTS_URL")
print "---------"
def save_on_redis(row):
redis_ = redis.StrictRedis(host=os.getenv("REDIS_REPORTS_URL"), port=6379, db=0)
print os.getenv("REDIS_REPORTS_URL")
print redis_
redis_.set("#teste#", "fagner")
df = spark.createDataFrame([(0,1), (0,1), (0,2)], ["id", "score"])
df.foreach(save_on_redis)