Highest Voted 'apache-spark-2.2' Questions

3

votes

1 answer

read textfile in pyspark2

I am trying to read a text file in spark 2.3 using python,but I get this error. This is the format textFile is in: name marks amar 100 babul 70 ram 98 krish 45 Code: df=spark.read.option("header","true")\ .option("delimiter"," ")\ …

pyspark apache-spark-2.2

asked Sep 17 '18 at 20:50

abhishek anand

35
1
8

3

votes

1 answer

Spark 2.x - How to generate simple Explain/Execution Plan

I am hoping to generate an explain/execution plan in Spark 2.2 with some actions on a dataframe. The goal here is to ensure that partition pruning is occurring as expected before I kick off the job and consume cluster resources. I tried a Spark…

apache-spark scala-2.11 apache-spark-2.2

asked May 29 '18 at 02:22

user9074332

2,336
2
23
39

3

votes

1 answer

Scala String Interpolation with Underscore

I am new to Scala so feel free to point me in the direction of documentation but I was not able to find an answer to this question in my research. I am using scala 2.11.8 with Spark2.2 and trying to create a dynamic string containing…

scala string-interpolation scala-2.11 apache-spark-2.2

asked May 06 '18 at 20:15

user9074332

2,336
2
23
39

2

votes

0 answers

Spark is unable to read arabic characters and replacing it with "?" on reading

When I try to read data from DB which has Arabic characters, they are getting replaced with " ? " I’m using java spark 2.2 version. Tried few things like encoding with UTF-8 but nothing worked.

java mysql arabic arabic-support apache-spark-2.2

asked Sep 03 '19 at 12:26

Ann Poh

41
2

2

votes

1 answer

how to get the row corresponding to the minimum value of some column in spark scala dataframe

i have the following code. df3 is created using the following code.i want to get the minimum value of distance_n and also the entire row containing that minimum value . //it give just the min value , but i want entire row containing that min…

scala apache-spark apache-spark-sql spark-streaming apache-spark-2.2

asked Oct 08 '18 at 05:54

stackoverflow

59
1
1
11

2

votes

1 answer

Timestamp formats and time zones in Spark (scala API)

******* UPDATE ******** As suggested in the comments I eliminated the irrelevant part of the code: My requirements: Unify number of milliseconds to 3 Transform string to timestamp and keep the value in UTC Create dataframe: val df =…

scala apache-spark apache-spark-2.2

asked Sep 01 '18 at 06:27

Playing With BI

411
1
9
20

2

votes

2 answers

kerberos authentication in Kudu for spark2 job

I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database. here you can see my spark2-submit statement spark2-submit --master yarn "spark.yarn.maxAppAttempts=1"…

kerberos cloudera apache-kudu apache-spark-2.2

asked Jun 08 '18 at 10:40

Lukas

31
1
3

2

votes

0 answers

Hadoop Config settings through spark-shell seems to have no effect

I'm trying to edit the hadoop block size configuration through spark shell so that the parquet part files generated are of a specific size. I tried setting several variables this way :- val blocksize:Int =…

scala apache-spark hadoop parquet apache-spark-2.2

asked Jun 07 '18 at 21:06

Sparky

743
6
15
28

2

votes

1 answer

Strange behavior on CSV parser of Spark 2 when multiLine option is enabled

When creating a DataFrame from a CSV file, if multiLine option is enabled, some file columns are parsed incorrectly. Here goes the code execution. I'll try to show the strange behaviors as the code goes. First, I load the file in two variables:…

apache-spark apache-spark-sql spark-csv apache-spark-2.2

asked May 15 '18 at 09:21

Fernando Lemos

297
3
9

2

votes

2 answers

Spark bucketing read performance

Spark version - 2.2.1. I've created a bucketed table with 64 buckets, I'm executing an aggregation function select t1.ifa,count(*) from $tblName t1 where t1.date_ = '2018-01-01' group by ifa . I can see that 64 tasks in Spark UI, which utilize just…

apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.2

asked Jan 18 '18 at 19:11

Modi

2,200
4
23
37

2

votes

1 answer

Why does collect_set aggregation add Exchange operator to join query of bucketed tables?

I'm using Spark-2.2. I'm POCing Spark's bucketing. I've created a bucketed table, here's the desc formatted my_bucketed_tbl output: +--------------------+--------------------+-------+ | col_name| …

apache-spark apache-spark-sql apache-spark-2.2

asked Dec 21 '17 at 14:25

Modi

2,200
4
23
37

1

vote

0 answers

Spark serializes variable value as null instead of its real value

My understanding of the mechanics of Spark's code distribution toward the nodes running it is merely cursory, and I fail in having my code successfully run within Spark's mapPartitions API when I wish to instantiate a class for each partition, with…

scala apache-spark apache-spark-2.2

asked Apr 30 '20 at 15:58

matanster

15,072
19
88
167

1

vote

2 answers

ClassNotFound with Ozzie, Azure HDInsight & Spark2

After researching for 1 week, had to put this request: Environment: Azure HDInsight Oozie version: "Oozie client build version: 4.2.0.2.6.5.3004-13" Spark: Spark2 My program: simple Scala program reads a file, i.csv, and writes the same into…

classnotfoundexception oozie azure-hdinsight apache-spark-2.2

asked Feb 13 '19 at 18:53

Eyedia Tech

135
1
11

1

vote

1 answer

Dynamic Allocation with spark streaming on yarn not scaling down executors

I'm using spark-streaming (spark version 2.2) on yarn cluster and am trying to enable dynamic core allocation for my application. The number of executors scale up as required but once executors are assigned, they are not being scaled down even when…

apache-spark spark-streaming apache-spark-2.2

asked Dec 25 '18 at 08:06

Nihal Agarwal

21
8

1

vote

1 answer

Spark - Operation not allowed: alter table replace columns

Looks like hive replace columns is not working with spark 2.2.1 and also with 2.3.1 alterSchemaSql : alter table myschema.mytable replace columns (a int,b int,d int) Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:…

apache-spark apache-spark-sql apache-spark-2.2 apache-spark-2.3

asked Dec 17 '18 at 23:53

nir

3,743
4
39
63

Questions tagged [apache-spark-2.2]