Highest Voted 'apache-spark-3.0' Questions

7

votes

2 answers

Does Spark Supports With Clause like SQL?

I have table employee_1 in spark with attributes id and name(with data), and another table employee_2 with same attributes, i want to load the data by increasing the id values with +1 My With Clause shown below: WITH EXP AS (SELECT ALIASNAME.ID+1…

asked May 15 '17 at 12:34

Ganesh Kumar

133
1
3
12

6

votes

3 answers

Is Star Schema (data modelling) still relevant with the Lake House pattern using Databricks?

The more I read about the Lake House architectural pattern and following the demos from Databricks I hardly see any discussion around Dimensional Modelling like in a traditional data warehouse (Kimball approach). I understand the compute and storage…

apache-spark bigdata databricks azure-databricks apache-spark-3.0

asked Nov 15 '21 at 22:40

Satya Azure

459
7
22

2

votes

1 answer

Aggregate function with Expr in PySpark 3.0.3

The following code works well with PySpark 3.2.1 df.withColumn( "total_amount", f.aggregate(f.col("taxes"), f.lit(0.00), lambda acc, x: acc + x["amount"]), ) I've downgraded to PySpark 3.0.3. How to change the above code to something like…

apache-spark pyspark apache-spark-sql aggregate apache-spark-3.0

asked Oct 03 '22 at 10:12

Smaillns

2,540
1
28
40

2

votes

0 answers

ImportError: Pandas >= 0.23.2 must be installed; however, it was not found. / pyspark/pandas are not properly imported in Apache Spark 3.2.1

I have Apache Spark 3.2.1 docker container running and got the below code. 3.2.1 version includes pandas. So I have changed the import line as "from pyspark import pandas as ps" but still I am getting the error …

pandas apache-spark pyspark pyspark-pandas apache-spark-3.0

asked Jul 12 '22 at 07:54

suj

507
1
8
22

2

votes

3 answers

How to get week of month in Spark 3.0+?

I cannot find any datetime formatting pattern to get the week of month in spark 3.0+ As use of 'W' is deprecated, is there a solution to get week of month without using legacy option? The below code doesn't work for spark 3.2.1 df =…

apache-spark datetime pyspark apache-spark-sql apache-spark-3.0

asked May 12 '22 at 07:51

Kavishka Gamage

102
2
10

1

vote

0 answers

How to provide hive metastore information via spark-submit?

Using Spark 3.1, I need to provide the hive configuration via the spark-submit command (not inside the code). Inside the code (which is not the solution I need), I can do the following which works fine (able to list database and select from tables.…

python apache-spark spark3 apache-spark-3.0

asked Mar 13 '23 at 10:42

Itération 122442

2,644
2
27
73

1

vote

0 answers

How to suppres INFO spark logs?

I am experimenting with apache spark 3 in intellij by creating a simple standalone scala application. When I run my program I get lots of INFO logs. Based on various SO answers I tried all of the…

scala apache-spark-3.0

asked Sep 23 '22 at 04:08

Mandroid

6,200
12
64
134

1

vote

0 answers

Issues defining an Aggregator with case class input

I'm trying to define a custom aggregation function which takes a StructType field as an input, using the Aggregator API with Dataframes. Spark version is 3.1.2. Here's a reduced example (basic one-field case class, being passed in as a Row and…

apache-spark apache-spark-sql user-defined-functions apache-spark-3.0

asked Aug 15 '22 at 16:47

Matthew Lavengood

11
1
3

1

vote

1 answer

How to set driver python path in cluster mode (pyspark)

My program runs fine in client mode ,but when I try to run in cluster mode if fails ,the reason for that is the python version on the cluster nodes is different I am trying to set the python driver path when my application runs in cluster mode below…

python-3.x apache-spark pyspark cloudera-cdh apache-spark-3.0

asked Aug 04 '22 at 10:37

Akhil

391
3
20

1

vote

0 answers

How force spark to move record nonnull fields as _corrupt_record?

Consider the code: import com.amazonaws.auth.DefaultAWSCredentialsProviderChain import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types.{StringType, StructField, StructType} object JsonAwsSchemaExample extends App{ val…

json scala apache-spark jsonschema apache-spark-3.0

asked Mar 12 '22 at 11:38

Cherry

31,309
66
224
364

1

vote

1 answer

How to round timestamp to 10 minutes in Spark 3.0?

I have a timestamp like that in $"my_col": 2022-01-21 22:11:11 with date_trunc("minute",($"my_col")) 2022-01-21 22:11:00 with date_trunc("hour",($"my_col")) 2022-01-21 22:00:00 What is a Spark 3.0 way to get 2022-01-21 22:10:00 ?

scala apache-spark apache-spark-sql apache-spark-3.0

asked Jan 23 '22 at 20:36

Eljah

4,188
4
41
85

0

votes

0 answers

Spark 3.3.1 picking up current date automatically in data frame if date is missing from given timestamp and not marking it as _corrupt record

I am using Spark 3.3.1 to read input CSV file having below header and value ID, CREATE_DATE 1, 14:42:23.0 I'm passing only time(HH:MM:SS.SSS) where as DATE(YYYY-MM-DD) is missing in CREATE_DATE field and reading CREATE_DATE field as…

python apache-spark pyspark apache-spark-3.0 spark2.4.4

asked Jul 09 '23 at 08:01

mayur kandekar

1
1

0

votes

0 answers

Spark Scala app getting NullPointerException while migrating in databricks from DBR 7.3 LTS(spark 3.0.1) to 9.1 LTS(spark 3.1.2)

We are migrating our Spark Scala jobs from AWS EMR (6.2.1 and Spark version - 3.0.1) to Lakehouse and few of our jobs are failing due to NullPointerException. When we tried to lower the Databricks Runtime environment to 7.3 LTS, it is working fine…

apache-spark databricks apache-spark-3.0

asked Apr 11 '23 at 06:08

PPPP

561
1
4
14

0

votes

1 answer

Unable to set "spark.driver.maxResultSize" in Spark 3.0

I am trying to convert a spark dataframe into a pandas dataframe. I have a sufficiently large driver. I am trying to set the spark.driver.maxResultSize value , like this spark = ( SparkSession .builder .appName('test') …

apache-spark pyspark apache-spark-3.0

asked Mar 10 '23 at 16:53

Ayan Biswas

1,641
9
39
66

0

votes

0 answers

Migrating from Spark 2.4 to Spark 3: How to convert a class that extends SharedSQLContext to use object SparkSession?

In Spark 2.4, there exists class SharedSQLContext and related APIs have been removed in Spark 3. The equivalent of SharedSQLContext from Spark 2.4 is the SparkSession object in Spark 3. I'm relatively new to Scala/Java, how do I approach converting…

java scala apache-spark spark3 apache-spark-3.0

asked Mar 06 '23 at 15:48

sojim2

1,245
2
15
38

Questions tagged [apache-spark-3.0]