Highest Voted 'spark-redshift' Questions

2

votes

1 answer

How to make an existing column NOT NULL in AWS REDSHIFT?

I had dynamically created a table through glue job and it is successfully working fine. But as per new requirement, I need to add a new column which generates unique values and should be primary key in redshift. I had implemented the same using…

asked Mar 05 '21 at 20:09

Pasha Shaik

93
1
8

2

votes

4 answers

: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.(S3;Ljava/util/concurrent/ThreadPoolExecutor;)V

I am trying to read redshift table data into red-shift data frame and writing that dataframe in another redshift table. Using following .jar in spark_submit for this task. Here is the command: spark-submit --jars…

apache-spark pyspark amazon-redshift apache-spark-sql spark-redshift

asked Dec 21 '19 at 09:31

iamabhaykmr

1,803
3
24
49

2

votes

0 answers

Connecting SparkR with Redshift: Failed to find data source: com.databricks.spark.redshift

I have an Spark cluster setup with Amazon EMR with RStudio installed on top of it. I am trying to connect sparkR with Redshift through the package spark-redshift_2.11-0.5.0.jar during which I am facing the error failed to find the data source:…

apache-spark amazon-emr sparkr rstudio-server spark-redshift

asked Aug 24 '19 at 09:41

Prabhu CV

23
3

2

votes

1 answer

How to write a pyspark-dataframe to redshift?

I am trying to write a pyspark DataFrame to Redshift but it results into error:- java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated Caused…

pyspark spark-avro spark-redshift

asked May 04 '19 at 10:06

murtaza1983

247
2
8

1

vote

1 answer

check whether is spark format exists or not

Context Spark reader has the function format, which is used to specify a data source type, for example, JSON, CSV or third party com.databricks.spark.redshift Help how can I check whether a third-party format exists or not, let me give a case In…

scala apache-spark sparkcore spark-redshift

asked Dec 08 '22 at 12:02

Pradip Sodha

85
8

1

vote

1 answer

Error writing dataframe in redshift using pyspark with boolean columns

In my script the write method of PySpark takes a data frame and writes it a Redshift, however in some dataframes there are boolean columns that return error stating that Redshift does not accept bit data type. My question is because it says that…

spark-redshift

asked Feb 10 '22 at 18:36

Paulo Victor

11
2

1

vote

0 answers

400 : Bad Request, py4j.protocol.Py4JJavaError: An error occurred while calling o44.save

I am able to connect to redshift using pyspark after some research and can read a table data into spark dataframe. Now, I am trying to insert that data frame into another redshift table(with same structure). Here is the code I am using to connect to…

apache-spark pyspark amazon-redshift apache-spark-sql spark-redshift

asked Dec 19 '19 at 06:52

iamabhaykmr

1,803
3
24
49

1

vote

1 answer

issue while connecting spark to redshift using spark -redshift connector

I need to connect spark to my redshift instance to generate data . I am using spark 1.6 with scala 2.10 . Have used compatible jdbc connector and spark-redshift connector. But i am facing a weird problem that is : I am using…

pyspark amazon-redshift spark-redshift

asked Jun 06 '19 at 16:02

Aldrin Machado

97
1
10

1

vote

0 answers

Pyspark issue with timestamps cast when reading MySQL DB

Python 2.7 Pyspark 2.2.1 JDBC format for MySQL->Spark DF For writing Spark DF-> AWS Redshift i am using the `Spark-Redshift` driver from Databricks. I am reading data into Spark from MySQL tables for my application, due to the context and depending…

mysql pyspark amazon-redshift apache-spark-sql spark-redshift

asked Mar 28 '19 at 09:43

balalaika

904
4
10
17

1

vote

2 answers

Unable to connect to S3 using spark-redshift library in java

I am trying to create a table in Redshift based on the spark dataset. I am using spark-redshift driver in jdbc to achieve this locally. The code snippet to do this data.write() .format("com.databricks.spark.redshift") .option("url",…

java apache-spark spark-redshift

asked Jan 25 '19 at 08:52

Ritika

73
1
9

0

votes

0 answers

Spark Error: Could not initialize class org.apache.spark.rdd.RDDOperationScope$

I'm trying to print rows from my Spark dataframe in Amazon Sagemaker. I have created a Spark dataframe by reading a table from a Redshift database. Printing the full table alone shows the column names and types. However, trying to show the actual…

python pyspark amazon-redshift amazon-sagemaker spark-redshift

asked Aug 01 '23 at 15:32

rabkrew2013

1

0

votes

0 answers

Databricks format in Pyspark to write in Redshift

I am migrating data from postgres to redshift by using jdbc format but for the redshift if I ise jdbc format then some of the options are not available like escape. So I thought to use format com.databricks.spark.redshift to write by using pyspark.…

apache-spark pyspark aws-glue spark-redshift aws-emr-studio

asked Jun 10 '23 at 04:06

vish anand

111
1
4

0

votes

0 answers

Writing data to Redshift using JDBC

I am trying to write dataframe to Redshift table with following code using jdbc connection. It is running very slow(running more than 20hours to process). Dataframe has 100 partitions. Can you suggest how do we improve the performance for writing df…

pyspark apache-spark-sql amazon-redshift spark-redshift

asked Mar 06 '23 at 07:57

Bab

177
2
6
17

0

votes

1 answer

How to optimize Redshift table for simple DELETE or SELECT queries?

I have a DELETE queries in Redshift that takes up to 40 seconds in productions. The queries are created programatically is looks like EXPLAIN DELETE FROM platform.myTable WHERE id IN…

amazon-redshift spark-redshift

asked Dec 30 '22 at 02:14

jn5047

101
1
7

0

votes

0 answers

Is there any way I can retain spaces in redshift while writing from aws glue

I am trying to store space in a varchar column in redshift. My data comes in a csv format and looks like this, "id","first_name","last_name","doj","address" "A1111","B1111","C1111","D111","E111" "A2222","B22222",""," ","E22" "A3333"," …

apache-spark pyspark amazon-redshift aws-glue spark-redshift

asked Dec 21 '22 at 13:38

Tushar Patil

748
4
13

Questions tagged [spark-redshift]