Highest Voted 'spark-jdbc' Questions

11

votes

1 answer

Spark: Difference between numPartitions in read.jdbc(..numPartitions..) and repartition(..numPartitions..)

I'm perplexed between the behaviour of numPartitions parameter in the following methods: DataFrameReader.jdbc Dataset.repartition The official docs of DataFrameReader.jdbc say following regarding numPartitions parameter numPartitions: the…

asked Jan 16 '18 at 07:44

y2k-shubham

10,183
11
55
131

8

votes

1 answer

How to use azure-sqldb-spark connector in pyspark

I want to write around 10 GB of data everyday to Azure SQL server DB using PySpark.Currently using JDBC driver which takes hours making insert statements one by one. I am planning to use azure-sqldb-spark connector which claims to turbo boost the…

azure apache-spark pyspark spark-jdbc

asked Oct 27 '18 at 07:10

Ajay Kumar

81
1
3

5

votes

0 answers

How to get the base type of an array type in portable JDBC

If you have a table with a column whose type is SQL ARRAY, how do you find the base type of the array type, aka the type of the individual elements of the array type? How do you do this in vendor-agnostic pure JDBC? How do you do this without…

sql apache-spark jdbc apache-spark-sql spark-jdbc

asked Jun 24 '21 at 15:13

Joshua Maurice

51
2

5

votes

1 answer

Prepared statement in spark-jdbc

I am trying to read the data from the MSSQL database using Spark jdbc with a specified offset. So the data should be loaded only after the specified timestamp which would be this offset. I tried to implement it by providing a query in jdbc…

sql-server apache-spark spark-jdbc

asked Apr 02 '19 at 16:49

Cassie

2,941
8
44
92

4

votes

1 answer

How spark reads from jdbc and distribute the data

I need clarity about how spark works under the hood when it comes to fetch data external databases. What I understood from spark documentation is that, if I do not mention attributes like "numPartitons","lowerBound" and "upperBound" then read via…

apache-spark spark-jdbc

asked Apr 19 '20 at 03:07

Sukanta Nath

41
3

4

votes

1 answer

Spark JDBC: DataFrameReader fails to read Oracle table with datatype as ROWID

I am trying to read a Oracle table using spark.read.format and it works great for all tables except few tables which has any column with datatype as ROWID. Below is my Code var df = spark.read.format("jdbc"). option("url", url). …

oracle scala apache-spark jdbc spark-jdbc

asked Sep 09 '18 at 12:21

Arghya Saha

227
1
4
17

4

votes

2 answers

Pseudocolumn in Spark JDBC

I am using a query to fetch data from MYSQL as follows: var df = spark.read.format("jdbc") .option("url", "jdbc:mysql://10.0.0.192:3306/retail_db") .option("driver" ,"com.mysql.jdbc.Driver") .option("user",…

apache-spark apache-spark-sql spark-jdbc

asked Dec 03 '17 at 06:41

clear sky

43
4

3

votes

0 answers

Apache Spark write to MySQL with JDBC connector (Write Mode: Ignore) is not performing as expected

I have my tables stored in MySQL with ID as primary key. I want to write using Spark to Mysql wherein it ignores the rows in dataframe which already exists in Mysql (based on primary key) and only writes the new set of rows. ID (PK) | Name |…

mysql apache-spark apache-spark-sql spark-jdbc

asked May 11 '18 at 16:46

freezthinker

51
5

3

votes

0 answers

Can't join on jdbc tables with common column names in spark 2.3

In earlier version of spark with I had two sql tables, t1: (id, body) t2: (id, name) I could query them like: spark.read.jdbc("t1 inner join t2 on t1.id = t2.id") .selectExpr("name", "body") Which would generate the following query: …

apache-spark apache-spark-sql spark-jdbc

asked Apr 06 '18 at 00:28

Fletcher Stump Smith

107
8

2

votes

0 answers

How to properly use foreachBatch() method in PySpark?

I am trying to sink results processed by Structured Streaming API in Spark to PostgreSQL. I tried the following approach (somehow simplified, but hope it's clear): class Processor: def __init__(self, args): self.spark_session =…

postgresql apache-spark pyspark spark-structured-streaming spark-jdbc

asked May 26 '22 at 16:47

papi

23
4

2

votes

1 answer

Apache Spark - passing jdbc connection object to executors

I am creating a jdbc object in spark driver and I am using that in executor to access the db. So my concern is that is it the same connection object or executors would get a copy of connection object so there would be separate connection per…

apache-spark spark-jdbc apache-spark-sql-repartition

asked Mar 05 '22 at 20:01

Suparn Lele

23
3

2

votes

1 answer

Spark SQL table read error 'Caused by: org.apache.spark.sql.AnalysisException: Invalid usage of '*' in expression 'unresolvedextractvalue''

I have written a sample java spark sql code in my local in eclipse to read data from remote databricks database table like below. I have set the hadoop_home and included spark jdbc driver too but still i am getting below error for every run. static…

apache-spark apache-spark-sql databricks azure-databricks spark-jdbc

asked Sep 09 '21 at 12:35

Sai Karthik N

21
1
4

2

votes

1 answer

PySpark pyspark.sql.DataFrameReader.jdbc() doesn't accept datetime type upperbound argument as document says

I found in the document for jdbc function in PySpark 3.0.1 at https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader, it says: column – the name of a column of numeric, date, or timestamp type that will be used…

apache-spark pyspark apache-spark-sql spark-jdbc

asked Feb 06 '21 at 22:02

syan

165
1
10

2

votes

1 answer

Loading data from Oracle table using spark JDBC is extremely slow

I am trying to read 500 millions records from a table using spark jdbc and then performance join on that tables . When i execute a sql from sql developer it takes 25 Minutes . But when i load this using spark JDBC it takes forever last time it ran…

pyspark apache-spark-sql aws-glue spark-jdbc

asked Oct 30 '20 at 11:23

Atharv Thakur

671
3
21
39

2

votes

1 answer

Loading data using sparkJDBCDataset with jars not working

When using a sparkJDBCDataset to load a table using a JDBC connection, I keep running into the error that spark cannot find my driver. The driver definitely exists on the machine and it's directory is specified inside the spark.yml file under…

pyspark spark-jdbc kedro

asked Mar 18 '20 at 16:28

Weiyi Yin

70
5

Questions tagged [spark-jdbc]