I have a ruby/rails application that allows users to search nationwide property data (103 million records). We are using the searchkick gem to take our Property model and store it in a searchable fashion in AWS OpenSearch. When I call Property.reindex
everything works correctly and all 103 million records are added to the index.
However, we have a monthly import to update the property data (there are some new records, but it's mostly updates). During the import we need to update the index of each property that is updated. To accomplish this we call property.reindex
during the import. The only problem is that during each import the searchable document count keeps increasing by the same amount as the number of properties being added + updated (see chart).
I was under the impression that updating the index for a single property would, in fact, update the index in OpenSearch and not add another entry. If this continues our OpenSearch cluster will run out of space. What can I do to avoid the extra searchable documents? I'd even settle for a way to clean them up after the fact (preferably without having to rebuild the entire index).
Any help is appreciated. Thanks!