Highest Voted 'apache-spark-1.5' Questions

27

votes

3 answers

Convert null values to empty array in Spark DataFrame

I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to an empty array so I don't have to deal with nulls later. I thought I could…

asked Jan 07 '16 at 16:55

Daniel Siegmann

287
1
3
5

24

votes

6 answers

"INSERT INTO ..." with SparkSQL HiveContext

I'm trying to run an insert statement with my HiveContext, like this: hiveContext.sql('insert into my_table (id, score) values (1, 10)') The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does…

apache-spark apache-spark-sql pyspark apache-spark-1.5 hivecontext

asked Nov 25 '15 at 17:55

Kirk Broadhurst

27,836
16
104
169

11

votes

1 answer

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql. Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom…

apache-spark apache-zeppelin apache-spark-1.5

asked Aug 30 '15 at 07:28

Wanchun

165
1
9

8

votes

2 answers

How to limit decimal values to 2 digits before applying agg function?

I am following this solution from one of the stack overflow post, my only requirement here is how can I limit the values that I want to sum to 2 digit after the decimal before applying the df.agg(sum()) function? For examples: I have values like…

scala apache-spark apache-spark-sql apache-spark-1.5

asked Jan 17 '17 at 19:40

Explorer

1,491
4
26
67

8

votes

3 answers

Spark job execution time

This might be a very simple question. But is there any simple way to measure the execution time of a spark job (submitted using spark-submit)? It would help us in profiling the spark jobs based on the size of input data. EDIT : I use…

apache-spark apache-spark-mllib apache-spark-1.5

asked Apr 30 '16 at 00:28

pranav3688

694
1
11
20

7

votes

1 answer

Save Spark Dataframe into Elasticsearch - Can’t handle type exception

I have designed a simple job to read data from MySQL and save it in Elasticsearch with Spark. Here is the code: JavaSparkContext sc = new JavaSparkContext( new SparkConf().setAppName("MySQLtoEs") .set("es.index.auto.create",…

elasticsearch apache-spark elasticsearch-hadoop apache-spark-1.5

asked Sep 19 '15 at 10:21

eliasah

39,588
11
124
154

7

votes

1 answer

Saving / exporting transformed DataFrame back to JDBC / MySQL

I'm trying to figure out how to use the new DataFrameWriter to write data back to a JDBC database. I can't seem to find any documentation for this, although looking at the source code it seems like it should be possible. A trivial example of what…

apache-spark apache-spark-sql apache-spark-1.5

asked Sep 16 '15 at 23:06

Matt Zukowski

4,469
4
37
38

5

votes

2 answers

Options to read large files (pure text, xml, json, csv) from hdfs in RStudio with SparkR 1.5

I am new to Spark and would like to know if there are other options than those ones below to read data stored in a hdfs from RStudio using SparkR or if I use them correctly. The data could be any kind (pure text, csv, json, xml or any database…

r sparkr apache-spark-1.5

asked Sep 15 '15 at 12:07

4711

61
4

4

votes

3 answers

How to work with Apache Spark using Intellij Idea?

I want to know what is the best way to work with Apache Spark using Intellij Idea? (specially for Scala programming language) Please explain step-by-step if you can. Thanks for answer

scala intellij-idea apache-spark scala-2.10 apache-spark-1.5

asked Oct 02 '15 at 05:07

Omid Ebrahimi

1,150
2
20
38

3

votes

0 answers

Field delimiter of Hive table not recognized by spark HiveContext

I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version…

apache-spark apache-spark-sql apache-spark-1.5 hivecontext spark-hive

asked Mar 25 '16 at 08:51

Shiva Achari

955
1
9
18

3

votes

3 answers

How to transpose dataframe in Spark 1.5 (no pivot operator available)?

I want to transpose following table using spark scala without Pivot function I am using Spark 1.5.1 and Pivot function does not support in 1.5.1. Please suggest suitable method to transpose following table: Customer Day Sales 1 Mon 12 1…

scala apache-spark apache-spark-sql apache-spark-1.5

asked Mar 25 '16 at 07:26

Nikhil

57
2
7

3

votes

2 answers

Can I have a master and worker on same node?

I have a 3 node spark standalone cluster and on the master node I also have a worker. When I submit a app to the cluster the two other workers start RUNNING, but the worker on the master node stay with status LOADING and eventually another worker is…

apache-spark cluster-computing master-slave apache-spark-1.5

asked Feb 01 '16 at 18:30

vntzy

113
1
2
6

3

votes

1 answer

Apache Spark dataframe createJDBCTable exception

Related to save to JDBC, trying to import a text file and save to a Hive JDBC file for import by reporting tools. We are running spark-1.5.1-bin-hadoop2.6 (master + 1 slave), the JDBC thrift server and the beeline client. They all seem to…

apache-spark apache-spark-sql apache-spark-1.5

asked Nov 18 '15 at 22:18

JP-SD

41
4

3

votes

4 answers

sbt-assembly: Merge Errors - Deduplicate

I am getting these errors using sbt assembly. I am using Spark which seems to be at the root of this problem. val Spark = Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, …

scala apache-spark sbt sbt-assembly apache-spark-1.5

asked Sep 30 '15 at 06:36

BAR

15,909
27
97
185

2

votes

0 answers

Error while defining dictionary in spark 1.5.0, python 2.6

I am running Cloudera Spark 1.5.0 with Python 2.6.6 I have defined 3 functions like this def tf(tokens): """ Compute Term/Token Frequency Args: tokens (list of str): input list of tokens from tokenize Returns: dictionary: a…

python apache-spark pyspark python-2.6 apache-spark-1.5

asked Dec 20 '16 at 12:58

Hardik Gupta

4,700
9
41
83

Questions tagged [apache-spark-1.5]