0

I am pretty new to spark. I have produced a file having around 420 mb of data with SPARK job. I have a Java application which only needs to query data concurrently from that file based on certain conditions and return data in json format. So far I have found two RESTful APIs for SPARK but they are only for submitting SPARK jobs remotely and managing SPARK contexts,

1)Livy
2)Spark job-server

If available,what are the other options for doing the same(except database)?

Utkarsh Saraf
  • 475
  • 8
  • 31

1 Answers1

2

You can actually use Livy to get results back as friendly JSON in a RESTful way!

session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
        'code': textwrap.dedent("""\
        val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
        val e = d.collect
        %json e
        """)}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()

My reference answer: Apache Livy: query Spark SQL via REST: possible?

Related: Livy Server: return a dataframe as JSON?

Garren S
  • 5,552
  • 3
  • 30
  • 45
  • Thanks Garren. Is there any size or length limit for json creation in this approach which can be sent back to application – Utkarsh Saraf Oct 12 '17 at 11:05
  • I don't know if there is a size limit, but there are some inherent constraints since it has to do to a "collect" of the results back to the driver. If you have the time to wait for it to build and return a million row result, it may well oblige ;) – Garren S Oct 12 '17 at 14:45
  • I am posting data in request body in `postman` as `{ "code":"textwrap.dedent(\"\"\" val d = spark.sql(\"SELECT COUNT(DISTINCT food_item) FROM food_item_tbl\") val e = d.collect \%json e \"\"\")}`.It is not working.Am i missing something here – Utkarsh Saraf Oct 18 '17 at 08:54
  • @UtkarshSaraf I don't know. Could you ask that as a question with the error and any other information? – Garren S Oct 18 '17 at 18:38