Questions tagged [apache-spark-1.5]

Use for questions specific to Apache Spark 1.5. For general questions related to Apache Spark use the tag [apache-spark].

Use for questions specific to Apache Spark 1.5. For general questions related to Apache Spark use the tag [apache-spark].

45 questions
27
votes
3 answers

Convert null values to empty array in Spark DataFrame

I have a Spark data frame where one column is an array of integers. The column is nullable because it is coming from a left outer join. I want to convert all null values to an empty array so I don't have to deal with nulls later. I thought I could…
24
votes
6 answers

"INSERT INTO ..." with SparkSQL HiveContext

I'm trying to run an insert statement with my HiveContext, like this: hiveContext.sql('insert into my_table (id, score) values (1, 10)') The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does…
11
votes
1 answer

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql. Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom…
Wanchun
  • 165
  • 1
  • 9
8
votes
2 answers

How to limit decimal values to 2 digits before applying agg function?

I am following this solution from one of the stack overflow post, my only requirement here is how can I limit the values that I want to sum to 2 digit after the decimal before applying the df.agg(sum()) function? For examples: I have values like…
Explorer
  • 1,491
  • 4
  • 26
  • 67
8
votes
3 answers

Spark job execution time

This might be a very simple question. But is there any simple way to measure the execution time of a spark job (submitted using spark-submit)? It would help us in profiling the spark jobs based on the size of input data. EDIT : I use…
pranav3688
  • 694
  • 1
  • 11
  • 20
7
votes
1 answer

Save Spark Dataframe into Elasticsearch - Can’t handle type exception

I have designed a simple job to read data from MySQL and save it in Elasticsearch with Spark. Here is the code: JavaSparkContext sc = new JavaSparkContext( new SparkConf().setAppName("MySQLtoEs") .set("es.index.auto.create",…
eliasah
  • 39,588
  • 11
  • 124
  • 154
7
votes
1 answer

Saving / exporting transformed DataFrame back to JDBC / MySQL

I'm trying to figure out how to use the new DataFrameWriter to write data back to a JDBC database. I can't seem to find any documentation for this, although looking at the source code it seems like it should be possible. A trivial example of what…
Matt Zukowski
  • 4,469
  • 4
  • 37
  • 38
5
votes
2 answers

Options to read large files (pure text, xml, json, csv) from hdfs in RStudio with SparkR 1.5

I am new to Spark and would like to know if there are other options than those ones below to read data stored in a hdfs from RStudio using SparkR or if I use them correctly. The data could be any kind (pure text, csv, json, xml or any database…
4711
  • 61
  • 4
4
votes
3 answers

How to work with Apache Spark using Intellij Idea?

I want to know what is the best way to work with Apache Spark using Intellij Idea? (specially for Scala programming language) Please explain step-by-step if you can. Thanks for answer
3
votes
0 answers

Field delimiter of Hive table not recognized by spark HiveContext

I have created a hive external table stored as textfile partitioned by event_date Date. How do we have to specify a specific format of csv while reading in spark from Hive table ? The environment is 1. 1.Spark 1.5.0 - cdh5.5.1 Using Scala version…
3
votes
3 answers

How to transpose dataframe in Spark 1.5 (no pivot operator available)?

I want to transpose following table using spark scala without Pivot function I am using Spark 1.5.1 and Pivot function does not support in 1.5.1. Please suggest suitable method to transpose following table: Customer Day Sales 1 Mon 12 1…
Nikhil
  • 57
  • 2
  • 7
3
votes
2 answers

Can I have a master and worker on same node?

I have a 3 node spark standalone cluster and on the master node I also have a worker. When I submit a app to the cluster the two other workers start RUNNING, but the worker on the master node stay with status LOADING and eventually another worker is…
3
votes
1 answer

Apache Spark dataframe createJDBCTable exception

Related to save to JDBC, trying to import a text file and save to a Hive JDBC file for import by reporting tools. We are running spark-1.5.1-bin-hadoop2.6 (master + 1 slave), the JDBC thrift server and the beeline client. They all seem to…
JP-SD
  • 41
  • 4
3
votes
4 answers

sbt-assembly: Merge Errors - Deduplicate

I am getting these errors using sbt assembly. I am using Spark which seems to be at the root of this problem. val Spark = Seq( "org.apache.spark" %% "spark-core" % sparkVersion, "org.apache.spark" %% "spark-sql" % sparkVersion, …
BAR
  • 15,909
  • 27
  • 97
  • 185
2
votes
0 answers

Error while defining dictionary in spark 1.5.0, python 2.6

I am running Cloudera Spark 1.5.0 with Python 2.6.6 I have defined 3 functions like this def tf(tokens): """ Compute Term/Token Frequency Args: tokens (list of str): input list of tokens from tokenize Returns: dictionary: a…
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83
1
2 3