Fetch data from redis using AWS Glue (python)

Question

I am trying to get data from redis using AWS Glue(python). I want to know how to connect to redis from spark context. Redis is also hosted in same AWS region

I saw code in redis website. Unable to find code sample for Pyspark.

import com.redislabs.provider.redis._

...

sc = new SparkContext(new SparkConf()
      .setMaster("local")
      .setAppName("myApp")

      // initial redis host - can be any node in cluster mode
      .set("redis.host", "localhost")

      // initial redis port
      .set("redis.port", "6379")

      // optional redis AUTH password
      .set("redis.auth", "")
  )

Is it possible to connect to redis from pyspark ??

ref this: https://stackoverflow.com/questions/32274540/write-data-to-redis-from-pyspark — zhiwen, Aug 21 '18 at 10:55
@zhiwen Updated question. How to add that zip file when using AWS glue ?? — Lijo Jose, Aug 21 '18 at 11:33

score 1 · Answer 1 · answered Aug 21 '18 at 13:40

Q: What data sources does AWS Glue support?

AWS Glue natively supports data stored in Amazon Aurora, Amazon RDS for MySQL, Amazon RDS for Oracle, Amazon RDS for PostgreSQL, Amazon RDS for SQL Server, Amazon Redshift, and Amazon S3, as well as MySQL, Oracle, Microsoft SQL Server, and PostgreSQL databases in your Virtual Private Cloud (Amazon VPC) running on Amazon EC2. The metadata stored in the AWS Glue Data Catalog can be readily accessed from Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. You can also write custom Scala or Python code and import custom libraries and Jar files into your Glue ETL jobs to access data sources not natively supported by AWS Glue. For more details on importing custom libraries, refer to our documentation.

From this QA, there should be some method to do. Please try. — zhiwen, Aug 21 '18 at 13:42

Fetch data from redis using AWS Glue (python)

1 Answers1