I'm trying to connect to Apache Solr datasource from Superset. As per my understanding Solr is in Java and Superset is developed in python and there is no dialect for Solr in SqlAlchemy.
2 Answers
You can't create a Superset datasource for Solr out of the box as (to your point) there is no SQLAlchemy dialect for Solr.
Note that SQLAlchemy (or Superset) wouldn't care whether that datasource is written in java (or Fortran for that matter) - as long as there is a functional SqlAlchemy dialect and Python driver.
That being said, the reason why a SQLAlchemy dialect doesn't exist for Solr is that they're built on different purposes and based off different structures.
Your best bet is probably to implement some type of data extraction process, to get the data you need out of Solr, and put it into a supported database.

- 1,188
- 8
- 10
-
Thanks for the response, you mentioned "the reason why a SQLAlchemy dialect doesn't exist for Solr is that they're built on different purposes and based off different structures". Would you give me little bit detail on that? – Ben Tennyson Dec 18 '17 at 16:45
-
SQLAlchemy is about mapping objects to relational tables in a structured database. The ultimate concern being to offer data persistence and retrieval of structured objects along ACID principles. SOLR on the other hand is concerned with document indexing and search. This doesn't mean that it can't be used as a persistence system (see https://stackoverflow.com/questions/3215029/nosql-mongodb-vs-lucene-or-solr-as-your-database) - but we're getting far off from the use-case of mapping objects with relationships along ACID principles. – David Tobiano Dec 18 '17 at 20:27
-
I suppose it would make sense to interface with Solr's /sql or /stream interfaces but then you'd need to implement a plugin I guess? See https://lucene.apache.org/solr/guide/7_7/parallel-sql-interface.html – Cominvent Mar 28 '19 at 11:41
Absolutely. You can use Spark-Solr and have a Spark Thrift server running and connect your superset with Thrift server. This stack worked for me.
Spark Solr is the link to github
Spark Solr is a powerful library to create dataframe out of Solr index. You can even write streaming expressions to join multiple collections. Spark Thrift provides a JDBC connection to your Spark engine.

- 551
- 5
- 14