1

I am using spark version 1.6.0..while I am using spark with python.I found that windows function are not been supported by the version of the spark that I am using,as when I tried to use windows function in my query(using sparksql) it gave me an error as 'you need to build spark with hive functionality'.Following that I searched various things and found that I need to use spark version 1.4.0.,which I did with no luck.Some posts also suggested to build spark with hive functionality.But I did not found the right way to do it.
when used spark 1.4.0.I got the following error.

raise ValueError("invalid mode %r (only r, w, b allowed)")
ValueError: invalid mode %r (only r, w, b allowed)
16/04/04 14:17:17 WARN PythonRDD: Incomplete task interrupted: Attempting to kil
l Python Worker
16/04/04 14:17:17 INFO HadoopRDD: Input split: file:/C:/Users/test
esktop/spark-1.4.0-bin-hadoop2.4/test:910178+910178
16/04/04 14:17:17 INFO Executor: Executor killed task 1.0 in stage 1.0 (TID 2)
16/04/04 14:17:17 WARN TaskSetManager: Lost task 1.0 in stage 1.0 (TID 2, localh
ost): TaskKilled (killed intentionally)
16/04/04 14:17:17 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have
all completed, from pool
eliasah
  • 39,588
  • 11
  • 124
  • 154
Jay Fifadra
  • 83
  • 1
  • 1
  • 5

1 Answers1

3

I think that this is the third time that I answer a similar question :

Windows function are supported with HiveContext and not regular SQLContext.

Concerning how to build spark with hive support, the answer is in the official Building Spark documentation :

Building with Hive and JDBC Support To enable Hive integration for Spark SQL along with its JDBC server and CLI, add the -Phive and Phive-thriftserver profiles to your existing build options. By default Spark will build with Hive 0.13.1 bindings.

Apache Hadoop 2.4.X with Hive 13 support (example):

mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package

Building for Scala 2.11

To produce a Spark package compiled with Scala 2.11, use the -Dscala-2.11 property:

./dev/change-scala-version.sh 2.11
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package

There is magic here, everything is in the documentation.

Community
  • 1
  • 1
eliasah
  • 39,588
  • 11
  • 124
  • 154
  • Just added 'install' after 'clean' in the above building script : mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean install package and the spark was builded in one and half hour. – Jay Fifadra Apr 05 '16 at 05:18
  • The package is long to build, but the install option isn't going to install spark. It's going to copy the create jar into your .m2 repository. – eliasah Apr 05 '16 at 05:20