0

I started develop facet search in my project. As example how it should work I took asos.com. As engine I use mysql for no fulltext search and sphinx for fulltext search. For faster search I denormalized data in db and created special table for this.

CREATE TABLE `item_search` (
  `id` int(10) unsigned NOT NULL DEFAULT '0',
  `item_id` int(10) unsigned NOT NULL DEFAULT '0',
  `tab` tinyint(3) unsigned NOT NULL,
  `designer_id` smallint(5) unsigned NOT NULL,
  `name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `price` decimal(18,2) NOT NULL,
  `arm` tinyint(3) unsigned NOT NULL,
  `bridge` tinyint(3) unsigned NOT NULL,
  `lens_width` tinyint(3) unsigned NOT NULL,
  `rating` int(11) unsigned DEFAULT '0',
  `pageviews` int(11) unsigned DEFAULT '0',
  `category_id` int(4) unsigned NOT NULL,
  KEY `index2` (`item_id`,`tab`,`price`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
/*!50100 PARTITION BY LIST (tab)
(PARTITION pg VALUES IN (1) ENGINE = InnoDB,
 PARTITION rx VALUES IN (2) ENGINE = InnoDB,
 PARTITION sun VALUES IN (3) ENGINE = InnoDB,
 PARTITION acc VALUES IN (4) ENGINE = InnoDB)

How to this looks in code.

if (Arr::get($args, 'query')){
  // Get item ids that was found by query
  $args['item_ids'] = Items::findByFullText($args);
}

...//Some code here

if ($item_ids = Arr::get($args, 'item_ids')){
  $where[] = "item_id IN (".implode(',', $item_ids).")";
}

...// Build and execute query and return item_ids

I can send multiple categories in multiple groups to search.

if ($categories = Arr::get($args, 'categories')){
  // group categories by categories group
  $categories = Category::groupCategories($categories);

  foreach ($categories as $group_id => $category_group){
   foreach ($category_group as $category_id){
    $item_ids[$group_id][] = Category::getItemIdsByCategory($category_id);
   }

   // Sum all item ids by this group
   if (sizeof($item_ids[$group_id]) == 1){
    $item_ids_by_group[$group_id] = $item_ids[$group_id][0];
   }else{
    $item_ids_by_group[$group_id] = call_user_func_array('array_merge', $item_ids[$group_id]);
   }
  }

  // Return found item ids by categories
   (sizeof($item_ids_by_group) == 1)
        ? $item_ids_by_group[$group_id]
        : call_user_func_array('array_intersect', $item_ids_by_group);
}

This is works pretty fast if enable cache for results, Only one problem that have here that no have possibility fetch categories like asos.do. What logic should be for returning categories ids for category groups? Many thanks for any response.

Andrii Tarykin
  • 628
  • 3
  • 9
  • 26
  • Why not use Solr or Sphinx for such things? They're built especially for such things. – Gabriel Nov 14 '13 at 10:33
  • Can you provide to me link of sphinx realisation for this issue, all examples that I seen before works in wrong way. Also reindex big sphinx index is cost to mutch time, mutch more than generate table with data. Good solution is elasticsearch but I don't have time for implement this technology. – Andrii Tarykin Nov 14 '13 at 11:19
  • Au contraire my friend. Check out my answer :) – Gabriel Nov 14 '13 at 11:27

1 Answers1

1

Mysql uses B-TREEs to store indexes, Sorl on Lucene (which is a powerfull Java library especially for searching stuff) uses Term Dictionaries and Binary Search for those terms. Indexing a billion documents in Solr, takes under 1 second, beat that with any sql solution.
Here is a nice tutorial: http://searchhub.org/2009/09/02/faceted-search-with-solr/
I have testeted with 500 milion documents and still got under 1 second results, these exact results can be proven by similar experiences on a simple google search.
Also, I might add, that since the overhead of using Mysql for faceting, it's not in use any more.

Basically the framework that is in used today as a best practice is write in Memcache/Solr and your primary Sql solution and then read only from Solr/Memcache.

Here's an eBay replica I used to work for and used Solr as it's faceting and search engine: http://www.okazii.ro/

Also, if you insist on sphinx, try: http://www.dreamstime.com

Also, check out this comparisson:

Choosing a stand-alone full-text search server: Sphinx or SOLR?

As you can see: Solr comes with facet support out of the box. Faceting in Sphinx takes more work.

Community
  • 1
  • 1
Gabriel
  • 772
  • 1
  • 13
  • 37
  • Thank you for your response, I will check Sorl in free time, also I have looked you search system, it looks good but works correct to me, [wrong results](http://www.okazii.ro/fashion-lifestyle/copii/?at[marime][]=28&at[culoare][]=Rosu&state=2&invoice=1&tari=1&judete=0&judete_lp=0&active_de=0&expira_in=0&l2c=14234&sort=bestmatch&per_page=48) Last category that I clicked is Cu factura show me 0 results, but this is not should be possible click to this category because I don't have search results for this. – Andrii Tarykin Nov 14 '13 at 11:45
  • I have updated my answer, please check it out. If this seems to solve your problem, check it as an answer. Thanks and good luck :) – Gabriel Nov 14 '13 at 11:49
  • @AndreyTarykin: The example it's not used to be a full glove solution, it works for us cause that is how we wanted it to function, you can make it work another way, what it is important is the fact that the faceting for over 1.000.000 products happens almost instant, this is because of the backend solr. – Gabriel Nov 14 '13 at 13:07