Questions tagged [apache-spark-2.2]

36 questions
3
votes
1 answer

read textfile in pyspark2

I am trying to read a text file in spark 2.3 using python,but I get this error. This is the format textFile is in: name marks amar 100 babul 70 ram 98 krish 45 Code: df=spark.read.option("header","true")\ .option("delimiter"," ")\ …
3
votes
1 answer

Spark 2.x - How to generate simple Explain/Execution Plan

I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark…
user9074332
  • 2,336
  • 2
  • 23
  • 39
3
votes
1 answer

Scala String Interpolation with Underscore

I am new to Scala so feel free to point me in the direction of documentation but I was not able to find an answer to this question in my research. I am using scala 2.11.8 with Spark2.2 and trying to create a dynamic string containing…
user9074332
  • 2,336
  • 2
  • 23
  • 39
2
votes
0 answers

Spark is unable to read arabic characters and replacing it with "?" on reading

When I try to read data from DB which has Arabic characters, they are getting replaced with " ? " I’m using java spark 2.2 version. Tried few things like encoding with UTF-8 but nothing worked.
Ann Poh
  • 41
  • 2
2
votes
1 answer

how to get the row corresponding to the minimum value of some column in spark scala dataframe

i have the following code. df3 is created using the following code.i want to get the minimum value of distance_n and also the entire row containing that minimum value . //it give just the min value , but i want entire row containing that min…
2
votes
1 answer

Timestamp formats and time zones in Spark (scala API)

******* UPDATE ******** As suggested in the comments I eliminated the irrelevant part of the code: My requirements: Unify number of milliseconds to 3 Transform string to timestamp and keep the value in UTC Create dataframe: val df =…
Playing With BI
  • 411
  • 1
  • 9
  • 20
2
votes
2 answers

kerberos authentication in Kudu for spark2 job

I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database. here you can see my spark2-submit statement spark2-submit --master yarn "spark.yarn.maxAppAttempts=1"…
Lukas
  • 31
  • 1
  • 3
2
votes
0 answers

Hadoop Config settings through spark-shell seems to have no effect

I'm trying to edit the hadoop block size configuration through spark shell so that the parquet part files generated are of a specific size. I tried setting several variables this way :- val blocksize:Int =…
Sparky
  • 743
  • 6
  • 15
  • 28
2
votes
1 answer

Strange behavior on CSV parser of Spark 2 when multiLine option is enabled

When creating a DataFrame from a CSV file, if multiLine option is enabled, some file columns are parsed incorrectly. Here goes the code execution. I'll try to show the strange behaviors as the code goes. First, I load the file in two variables:…
2
votes
2 answers

Spark bucketing read performance

Spark version - 2.2.1. I've created a bucketed table with 64 buckets, I'm executing an aggregation function select t1.ifa,count(*) from $tblName t1 where t1.date_ = '2018-01-01' group by ifa . I can see that 64 tasks in Spark UI, which utilize just…
2
votes
1 answer

Why does collect_set aggregation add Exchange operator to join query of bucketed tables?

I'm using Spark-2.2. I'm POCing Spark's bucketing. I've created a bucketed table, here's the desc formatted my_bucketed_tbl output: +--------------------+--------------------+-------+ | col_name| …
Modi
  • 2,200
  • 4
  • 23
  • 37
1
vote
0 answers

Spark serializes variable value as null instead of its real value

My understanding of the mechanics of Spark's code distribution toward the nodes running it is merely cursory, and I fail in having my code successfully run within Spark's mapPartitions API when I wish to instantiate a class for each partition, with…
matanster
  • 15,072
  • 19
  • 88
  • 167
1
vote
2 answers

ClassNotFound with Ozzie, Azure HDInsight & Spark2

After researching for 1 week, had to put this request: Environment: Azure HDInsight Oozie version: "Oozie client build version: 4.2.0.2.6.5.3004-13" Spark: Spark2 My program: simple Scala program reads a file, i.csv, and writes the same into…
1
vote
1 answer

Dynamic Allocation with spark streaming on yarn not scaling down executors

I'm using spark-streaming (spark version 2.2) on yarn cluster and am trying to enable dynamic core allocation for my application. The number of executors scale up as required but once executors are assigned, they are not being scaled down even when…
1
vote
1 answer

Spark - Operation not allowed: alter table replace columns

Looks like hive replace columns is not working with spark 2.2.1 and also with 2.3.1 alterSchemaSql : alter table myschema.mytable replace columns (a int,b int,d int) Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:…
nir
  • 3,743
  • 4
  • 39
  • 63
1
2 3