3

I have a ruby/rails application that allows users to search nationwide property data (103 million records). We are using the searchkick gem to take our Property model and store it in a searchable fashion in AWS OpenSearch. When I call Property.reindex everything works correctly and all 103 million records are added to the index.

However, we have a monthly import to update the property data (there are some new records, but it's mostly updates). During the import we need to update the index of each property that is updated. To accomplish this we call property.reindex during the import. The only problem is that during each import the searchable document count keeps increasing by the same amount as the number of properties being added + updated (see chart).

Searchable document count over time

I was under the impression that updating the index for a single property would, in fact, update the index in OpenSearch and not add another entry. If this continues our OpenSearch cluster will run out of space. What can I do to avoid the extra searchable documents? I'd even settle for a way to clean them up after the fact (preferably without having to rebuild the entire index).

Any help is appreciated. Thanks!

Sam
  • 15,254
  • 25
  • 90
  • 145
siannopollo
  • 1,464
  • 11
  • 24
  • 1
    Are you sure you're using the same ID each time on each reindex? If not you keep creating new (duplicate) documents – Val Jun 03 '22 at 07:55
  • @Val The ID _should_ remain the same, unless searchkick is doing some magic somewhere. I'll need to do a little digging to find out for sure. – siannopollo Jun 03 '22 at 16:35

0 Answers0