2

I have the pipeline of Hbase, Lily, Solr and Hue setup for search and visualization. I am able to search on the data indexed in Solr using Hue, except I cannot view all the required data since I do not have all the fields from Hbase stored in Solr. I'm not planning on storing all of the data as well.

So is there a way of retrieving those fields from Hbase along with the Solr response for visualizing the data with Hue?

From what I know, I believe it is possible to setup the Solr searchhandler to perform this, but I haven't been able to find a concrete example to help me understand better(I am very new to both Solr and Hbase, so examples help)

My question is similar to this question. But I am unable to comment there for further information.

Current Solution thanks to suggestion by Romain: Used HTML widget to provide a link for each record in Hue Search page back to the Hbase record on the Hbase Browser.

Community
  • 1
  • 1
Abhilash
  • 33
  • 4

1 Answers1

1

One of the approach is, fetch the required id from the solr, and then get the actual data from Hbase. Well solr gives you the count based on your query and also some faceting features. Once those are fetched, and you always have the data in Hbase. Solr is best for index search. So given the speed and space compromise, this design can help. Another main reason is Hbase gives you good fetch times for entire row, when searched based on row key. So, the overall performance depends on your Hbase row key design also.

i think you are using lily Hbase indexer if I am not wrong. so by default the doc id is the hbase row key, which might make things easy

Ramzy
  • 6,948
  • 6
  • 18
  • 30
  • Thank you. That is the general idea that I had too, and yes, I am using the Lily Hbase Indexer and the doc id is the hbase row key. So more specifically, is it possible to get Hbase data as part of the Solr query result, or does it have to be an outside process? – Abhilash Jul 15 '15 at 15:33
  • Ideal way can work out with getting facet counts, id's from solr, and then complete data from data source. You can give a helpful vote if it helped and and wait for the answers from others. Thanks – Ramzy Jul 15 '15 at 15:36
  • thank you. I'll see how I can do this. It would be great if you can guide me towards any resource that can help. I'm not entirely sure how 'complete data from data source' will work. – Abhilash Jul 15 '15 at 16:02
  • I mean to say, when you are using Hbase indexer, your solr document id is the hbase row key. So you search solr with any indexed field and you get the id. That is the row key. So then you go to hbase(may be hbase java client code) with that id as row key and fetch entire row in hbase where you have complete data – Ramzy Jul 15 '15 at 17:01
  • Yup, that I understood. That would be a two step process to retrieve ID and use that to get the row from Hbase. I can do that programmatically if I am building my own search UI. But since I am using tools such as Banana/Hue for visualization that work directly on Solr results, is it in any way possible to configure/program Solr such that it does the fetching from Hbase for me? I mean to say can Solr provide its results and the Hbase results as part of its JSON response? – Abhilash Jul 15 '15 at 19:07
  • so you need Hue input(a json), to be comprised of Hbase data(this hbase could be coming from above desing in 2 steps)? Am I correct? – Ramzy Jul 15 '15 at 20:07
  • This is correct, the Search UI does not support this OOTB. There is plan to have the user edit directly the indexed document or have a kind of link that will open the original record if on HDFS, HBase etc: https://issues.cloudera.org/browse/HUE-2857 Right now the later can be done with the HTML widget but require a bunch of custom js/html code to do. – Romain Jul 16 '15 at 06:06
  • @Romain Thank you for clarifying. I think I'll try providing a link to the original record. That should work for now. – Abhilash Jul 16 '15 at 14:20