Questions tagged [apache-spark-2.2]
36 questions
3
votes
1 answer
read textfile in pyspark2
I am trying to read a text file in spark 2.3 using python,but I get this error.
This is the format textFile is in:
name marks
amar 100
babul 70
ram 98
krish 45
Code:
df=spark.read.option("header","true")\
.option("delimiter"," ")\
…

abhishek anand
- 35
- 1
- 8
3
votes
1 answer
Spark 2.x - How to generate simple Explain/Execution Plan
I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark…

user9074332
- 2,336
- 2
- 23
- 39
3
votes
1 answer
Scala String Interpolation with Underscore
I am new to Scala so feel free to point me in the direction of documentation but I was not able to find an answer to this question in my research.
I am using scala 2.11.8 with Spark2.2 and trying to create a dynamic string containing…

user9074332
- 2,336
- 2
- 23
- 39
2
votes
0 answers
Spark is unable to read arabic characters and replacing it with "?" on reading
When I try to read data from DB which has Arabic characters, they are getting replaced with " ? "
I’m using java spark 2.2 version.
Tried few things like encoding with UTF-8 but nothing worked.

Ann Poh
- 41
- 2
2
votes
1 answer
how to get the row corresponding to the minimum value of some column in spark scala dataframe
i have the following code. df3 is created using the following code.i want to get the minimum value of distance_n and also the entire row containing that minimum value .
//it give just the min value , but i want entire row containing that min…

stackoverflow
- 59
- 1
- 1
- 11
2
votes
1 answer
Timestamp formats and time zones in Spark (scala API)
******* UPDATE ********
As suggested in the comments I eliminated the irrelevant part of the code:
My requirements:
Unify number of milliseconds to 3
Transform string to timestamp and keep the value in UTC
Create dataframe:
val df =…

Playing With BI
- 411
- 1
- 9
- 20
2
votes
2 answers
kerberos authentication in Kudu for spark2 job
I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database.
here you can see my spark2-submit statement
spark2-submit --master yarn "spark.yarn.maxAppAttempts=1"…

Lukas
- 31
- 1
- 3
2
votes
0 answers
Hadoop Config settings through spark-shell seems to have no effect
I'm trying to edit the hadoop block size configuration through spark shell so that the parquet part files generated are of a specific size. I tried setting several variables this way :-
val blocksize:Int =…

Sparky
- 743
- 6
- 15
- 28
2
votes
1 answer
Strange behavior on CSV parser of Spark 2 when multiLine option is enabled
When creating a DataFrame from a CSV file, if multiLine option is enabled, some file columns are parsed incorrectly.
Here goes the code execution. I'll try to show the strange behaviors as the code goes.
First, I load the file in two variables:…

Fernando Lemos
- 297
- 3
- 9
2
votes
2 answers
Spark bucketing read performance
Spark version - 2.2.1.
I've created a bucketed table with 64 buckets, I'm executing an aggregation function select t1.ifa,count(*) from $tblName t1 where t1.date_ = '2018-01-01' group by ifa . I can see that 64 tasks in Spark UI, which utilize just…

Modi
- 2,200
- 4
- 23
- 37
2
votes
1 answer
Why does collect_set aggregation add Exchange operator to join query of bucketed tables?
I'm using Spark-2.2.
I'm POCing Spark's bucketing.
I've created a bucketed table, here's the desc formatted my_bucketed_tbl output:
+--------------------+--------------------+-------+
| col_name| …

Modi
- 2,200
- 4
- 23
- 37
1
vote
0 answers
Spark serializes variable value as null instead of its real value
My understanding of the mechanics of Spark's code distribution toward the nodes running it is merely cursory, and I fail in having my code successfully run within Spark's mapPartitions API when I wish to instantiate a class for each partition, with…

matanster
- 15,072
- 19
- 88
- 167
1
vote
2 answers
ClassNotFound with Ozzie, Azure HDInsight & Spark2
After researching for 1 week, had to put this request:
Environment: Azure HDInsight
Oozie version: "Oozie client build version: 4.2.0.2.6.5.3004-13"
Spark: Spark2
My program: simple Scala program reads a file, i.csv, and writes the same into…

Eyedia Tech
- 135
- 1
- 11
1
vote
1 answer
Dynamic Allocation with spark streaming on yarn not scaling down executors
I'm using spark-streaming (spark version 2.2) on yarn cluster and am trying to enable dynamic core allocation for my application.
The number of executors scale up as required but once executors are assigned, they are not being scaled down even when…

Nihal Agarwal
- 21
- 8
1
vote
1 answer
Spark - Operation not allowed: alter table replace columns
Looks like hive replace columns is not working with spark 2.2.1 and also with 2.3.1
alterSchemaSql : alter table myschema.mytable replace columns (a int,b int,d int)
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:…

nir
- 3,743
- 4
- 39
- 63