Questions tagged [apache-spark-2.1.1]

10 questions
3
votes
3 answers

Unable to load pyspark inside virtualenv

I had installed pyspark in a python virtualenv. I have also installed jupyterlab which was newly released http://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html in the virtualenv. I was unable to fire pyspark within a…
Pranay Aryal
  • 5,208
  • 4
  • 30
  • 41
2
votes
0 answers

Spark2 Datetime lookup efficient datastructure

I have a Spark application with records which contain the following information: Hash - Some unique identifier for an item Location - The location of the item From - The date on which the item was first seen in location To - Null if still there or…
Jon Taylor
  • 7,865
  • 5
  • 30
  • 55
2
votes
0 answers

Assing spark-deep-learning external jar to spark with python on amazon-EMR

I've been trying to get the spark-deep-learning library working on my EMR cluster to be able to read images in parallel with Python 2.7. I have been searching for this for quite some time now and I have failed to reach a solution. I have tried…
1
vote
0 answers

Uneven distribution of tasks among the spark executors

I am using spark-streaming 2.2.1 on production and in this application i read the data from RabbitMQ and do the further processing and finally save it in the cassandra. So, i am facing this strange issue where number of tasks are not evenly…
Naresh
  • 5,073
  • 12
  • 67
  • 124
1
vote
1 answer

Is it possible to expose/add your custom APIs to the existing Spark's driver REST endpoints?

Spark exposes certain API endpoints (usually mounted at /api/v1). Is their someway to expose custom end-points using the same spark server? ( Using Spark 2.1.1 , Structured Streaming )
1
vote
0 answers

Writing the output of Batch Queries to Kafka for Spark version 2.1.1

Can somebody give me pointers on how can I load the output of Batch Queries to kafka. I researched a lot in stackoverflow and other articles but I was unable to find anything for Spark 2.1.1 . For higher versions of spark, there is an easy way to…
1
vote
0 answers

Issue with try and except block in pyspark

I use spark-2.1 .Below is my code delta="insert overwrite table schema1.table1 select * from schema2.table2" try: spark.sql(delta) except Exception as e: spark.sql("drop table schema2.table2") …
0
votes
1 answer

Saved Model : LinearRegression does not seem to work

I am using Azure and Spark version is '2.1.1.2.6.2.3-1 I have saved my model using the following command: def fit_LR(training,testing,adl_root_path,location,modelName): training.cache() lr = LinearRegression(featuresCol =…
E B
  • 1,073
  • 3
  • 23
  • 36
0
votes
1 answer

Pyspark read data - java.util.NoSuchElementException: spark.sql.execution.pandas.respectSessionTimeZone

I have a program that is working in command line, but I'm trying to set up PyCharm to test its functionalities individually. I must have configured something wrong, because whenever I try to read any data (whether it's a hive query or a csv), I get…
Laurent
  • 1,914
  • 2
  • 11
  • 25
0
votes
2 answers

How to use a window function to count day of week occurrences in Pyspark 2.1

With the below pyspark dataset (2.1), how to you use a windowing function that would count the number of times the current record's day of week appeared int he last 28 days. Example Data frame: from pyspark.sql import functions as F df =…
Micah Pearce
  • 1,805
  • 3
  • 28
  • 61