Why is my Spark Thrift server very slow with HTTP?

Question

My organisation set up a Spark Thrift server that is configured to use SSL over HTTP. The intent is to enable Power BI to retrieve data via Spark securely. However, simply retrieving schema information can take up to 10 minutes, and a further 10+ mins for the first 1000 rows of data!

Clearly, this is unworkable so we set about on a process of elimination. We captured a huge amount of data and additional details, but I think our discoveries can be distilled down to:

Wireshark was used on the Power BI computer. This showed Power BI was spending most of its time waiting for packets: not the client’s processing.
We used the Admin UI to get the exact commands that Power BI was issuing to the spark thrift server: the client’s commands were not efficient but not unreasonable.
Beeline was used (on another machine in the same cluster) to connect and execute the exact same commands that Power BI was executing: execution was FAST.
Simba ODBC drivers were used (on a workstation) to connect and execute a simple SELECT * command: execution was slow (1 second per row retrieved).
TCP dump was used on the Thrift Server. This showed most of the time was spent waiting for the thrift server to send packets: with #1, this is not a network latency issue.
We changed server config to ‘Standard’ or binary protocol, connected with Power BI: execution was FAST!
We reverted server config to ‘HTTP’ but without SSL: execution was SLOW.

Do these bits of information point to any holes in my elimination process or obvious potential problems that we have missed?

So this seems to point to a problem specifically with the use of HTTP (over port 10001)?

This is indeed a confusing topic. Do you have spark.sql.hive.thriftServer.singleSession=false Try this. That said, I am a little sceptical on all this. — thebluephantom, Aug 13 '19 at 08:17

QA Collective · Answer 1 · 2019-08-23T06:36:43.870

After many weeks of looking into this, incidentally, someone restarted a downstream YARN server that was being used to manage Spark jobs in a cluster. Suddenly, all the data being returned from the Thrift server came through lightening fast in HTTPS mode.

Turns out that the YARN server was running out of memory due to a bad garbage collection policy. So the Thrift server was responding with data slowly because the YARN server was falling over. The garbage collector was replaced entirely and reconfigured and now seems to be working okay.

So I guess the moral of my story is to check the entire stack for problems and maybe just start off by rebooting everything involved (in a non-production environment) to see if that makes a difference! In my particular instance, I didn't have access to much of the underlying infrastructure involved so wasn't able to troubleshoot broadly and freely.

Why is my Spark Thrift server very slow with HTTP?

1 Answers1