3

The apache Livy documentation is sparse: is it possible to return Spark SQL query resultsets as REST calls using Apache Livy? The calling application is mobile and it cannot use odbc/jdbc to connect. So the Spark thriftserver is not an option.

MarkTeehan
  • 303
  • 7
  • 18
  • Yes it is possible to query using spark SQL via livy but it is not possible currently to issue pure SQL. The SQL needs to be wrapped in Python, scala or java. The Livy results can be formatted using relatively undocumented features as JSON. I'll provide a better answer when back on laptop :) – Garren S Aug 16 '17 at 04:39

2 Answers2

4

Yes, it is possible to submit Spark SQL queries through Livy. However, there is [currently] no support for the queries being submitted on their own. They would need to be wrapped in Python or Scala code.

Here are two examples of executing Spark SQL queries using Python to interact with Livy via requests lib and Scala code as a string to be executed "in spark":

1) using %json magic in livy (https://github.com/apache/incubator-livy/blob/412ccc8fcf96854fedbe76af8e5a6fec2c542d25/repl/src/test/scala/org/apache/livy/repl/PythonInterpreterSpec.scala#L91)

session_url = host + "/sessions/1"
statements_url = session_url + '/statements'
data = {
        'code': textwrap.dedent("""\
        val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl")
        val e = d.collect
        %json e
        """)}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()

2) using %table magic in livy (https://github.com/apache/incubator-livy/blob/412ccc8fcf96854fedbe76af8e5a6fec2c542d25/repl/src/test/scala/org/apache/livy/repl/PythonInterpreterSpec.scala#L105)

session_url = host + "/sessions/21"
statements_url = session_url + '/statements'
data = {
        'code': textwrap.dedent("""\
        val x = List((1, "a", 0.12), (3, "b", 0.63))
        %table x
        """)}
r = requests.post(statements_url, data=json.dumps(data), headers=headers)
print r.json()
Garren S
  • 5,552
  • 3
  • 30
  • 45
1

If you're using Livy 0.7.0 or up, you don't really need magic strings.

  1. Create a session:
curl --location --request POST 'http://<host>:<port>/sessions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "kind": "sql",
    "proxyUser": "cooL_user"
}'
  1. Run a query:
curl --location --request POST 'http://<host>:<port>/sessions/<sessionid>/statements' \
--header 'Content-Type: application/json' \
--data-raw '{
    "code": "select 1"
}'
  1. Periodically poll on the result:
curl --location --request GET http://<host>:<port>/sessions/<sessionid>/statements/0 | jq '.output.data.application/json.data' 

That's it, now you can use this approach with any language. There's already a Python Library called PyLivy which roughly follows this approach under the hood.

sbrk
  • 1,338
  • 1
  • 17
  • 25
  • This didn't work for me, got `Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.` You're probably assuming the environment is set up in a particular way? – Michał Zawadzki Feb 15 '23 at 17:38