0

User case: I want to develop a microservice with SpringBoot and ElasticSearch following search-as-user-type pattern. In o ther words, if I type "d" I want answer back Demetrio, Denis, Daniel. Typing second letter "e" brings Demetrio and Denis and finaly the third will retrieve the exact name. Even typing in middle letter should bring. "en" should bring Denis and Daniel. Pretty common case o f search as user type.

I am studing recommendations found in:

edgengram

search-as-you-type field type

search-analyzer

Current issue: when I boot my application aimed to create and set ElasticSearch I get the exception from this question topic. The index is created succesfully and my initial data loaded but it seems the analyzer is totally ignored.

Full logs while booting the SpringBoot:

2020-04-10 14:27:40.281  INFO 16556 --- [           main] com.poc.search.SearchApplication         : Starting SearchApplication on SPANOT164 with PID 16556 (C:\WSs\elasticsearch\search\target\classes started by Cast in C:\WSs\elasticsearch\search)
2020-04-10 14:27:40.286  INFO 16556 --- [           main] com.poc.search.SearchApplication         : No active profile set, falling back to default profiles: default
2020-04-10 14:27:40.863  INFO 16556 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data Elasticsearch repositories in DEFAULT mode.
2020-04-10 14:27:40.931  INFO 16556 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 62ms. Found 1 Elasticsearch repository interfaces.
2020-04-10 14:27:41.101  INFO 16556 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data Reactive Elasticsearch repositories in DEFAULT mode.
2020-04-10 14:27:41.120  INFO 16556 --- [           main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 13ms. Found 0 Reactive Elasticsearch repository interfaces.
2020-04-10 14:27:42.343  INFO 16556 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2020-04-10 14:27:42.360  INFO 16556 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2020-04-10 14:27:42.360  INFO 16556 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.33]
2020-04-10 14:27:42.496  INFO 16556 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2020-04-10 14:27:42.496  INFO 16556 --- [           main] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 2122 ms
2020-04-10 14:27:43.221  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : no modules loaded
2020-04-10 14:27:43.222  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : loaded plugin [org.elasticsearch.index.reindex.ReindexPlugin]
2020-04-10 14:27:43.222  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : loaded plugin [org.elasticsearch.join.ParentJoinPlugin]
2020-04-10 14:27:43.222  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : loaded plugin [org.elasticsearch.percolator.PercolatorPlugin]
2020-04-10 14:27:43.222  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : loaded plugin [org.elasticsearch.script.mustache.MustachePlugin]
2020-04-10 14:27:43.222  INFO 16556 --- [           main] o.elasticsearch.plugins.PluginsService   : loaded plugin [org.elasticsearch.transport.Netty4Plugin]
2020-04-10 14:27:45.480  INFO 16556 --- [           main] o.s.d.e.c.TransportClientFactoryBean     : Adding transport node : 127.0.0.1:9300
2020-04-10 14:27:47.539 ERROR 16556 --- [           main] .d.e.r.s.AbstractElasticsearchRepository : failed to load elasticsearch nodes : org.elasticsearch.index.mapper.MapperParsingException: analyzer [autocomplete_index] not found for field [palavra]
2020-04-10 14:27:47.775  INFO 16556 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'
2020-04-10 14:27:48.333  INFO 16556 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2020-04-10 14:27:48.334  INFO 16556 --- [           main] com.poc.search.SearchApplication         : Started SearchApplication in 8.714 seconds (JVM running for 9.159)

elastic-analyzer.json from resources/data/es-config

{
  "analysis": {
    "filter": {
      "autocomplete_filter": {
        "type": "edge_ngram",
        "min_gram": 1,
        "max_gram": 20
      }
    },
    "analyzer": {
      "autocomplete_search": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase"
        ]
      },
      "autocomplete_index": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "autocomplete_filter"
        ]
      }
    }
  }
}

ElasticSearchLoader

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.type.CollectionType;
import com.fasterxml.jackson.databind.type.TypeFactory;
import com.poc.search.model.Correntista;
import com.poc.search.service.CorrentistaService;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.CommandLineRunner;
import org.springframework.core.io.Resource;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.util.List;
import java.util.UUID;
import java.util.stream.Collectors;

@Component
public class ElasticSearchDataLoader implements CommandLineRunner {

    @Value("classpath:data/correntistas.json")
    private Resource usersJsonFile;

    @Autowired
    private CorrentistaService correntistaService;

    @Override
    public void run(String... args) throws Exception {
        if (this.isInitialized()) {
            return;
        }

        List<Correntista> users = this.loadUsersFromFile();
        users.forEach(correntistaService::save);
    }

    private List<Correntista> loadUsersFromFile() throws IOException {
        ObjectMapper objectMapper = new ObjectMapper();
        CollectionType collectionType = TypeFactory.defaultInstance().constructCollectionType(List.class, CorrentistaInitData.class);
        List<CorrentistaInitData> allFakeUsers = objectMapper.readValue(this.usersJsonFile.getFile(), collectionType);
        return allFakeUsers.stream().map(this::from).map(this::generateId).collect(Collectors.toList());
    }

    private Correntista generateId(Correntista correntista) {
        correntista.setId(UUID.randomUUID().toString());
        return correntista;
    }

    private Correntista from(CorrentistaInitData correntistaJson) {
        Correntista correntista = new Correntista();
        correntista.setConta(correntistaJson.getConta());
        correntista.setSobrenome(correntistaJson.getSobrenome());
        correntista.setPalavra(correntistaJson.getNome());
        return correntista;
    }

    private boolean isInitialized() {
        return this.correntistaService.count() > 0;
    }
}

Correntista model

@Document(indexName = "correntistas")
@Setting(settingPath = "es-config/elastic-analyzer.json")
@Getter
@Setter
public class Correntista {
    @Id
    private String id;
    private String conta;
    private String sobrenome;

    @Field(type = FieldType.Text, analyzer = "autocomplete_index", searchAnalyzer = "autocomplete_search")
    private String palavra;
}

application.yml

spring:
  data:
    elasticsearch:
      cluster-name: docker-cluster
      cluster-nodes: localhost:9300

application boot:

@EnableElasticsearchRepositories
@SpringBootApplication
public class SearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(SearchApplication.class, args);
    }

}

Elastic index settings

{
    "correntistas": {
        "settings": {
            "index": {
                "refresh_interval": "1s",
                "number_of_shards": "5",
                "provided_name": "correntistas",
                "creation_date": "1586539666845",
                "store": {
                    "type": "fs"
                },
                "number_of_replicas": "1",
                "uuid": "2eEha4aMQm2bdut4pd0aAg",
                "version": {
                    "created": "6080499"
                }
            }
        }
    }
}

all data initially loaded as expected

{
  "took": 66,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "correntistas",
        "_type": "correntista",
        "_id": "7353cd8c-791d-47f5-90b6-a1b5bcf83853",
        "_score": 1.0,
        "_source": {
          "id": "7353cd8c-791d-47f5-90b6-a1b5bcf83853",
          "conta": "1234",
          "sobrenome": "Carvalho",
          "palavra": "Demetrio"
        }
      },
      {
        "_index": "correntistas",
        "_type": "correntista",
        "_id": "122db1bc-584d-4bef-b5ea-3d9e0d42448e",
        "_score": 1.0,
        "_source": {
          "id": "122db1bc-584d-4bef-b5ea-3d9e0d42448e",
          "conta": "5678",
          "sobrenome": "Carv",
          "palavra": "Deme"
        }
      }
    ]
  }
}

So, my main question is: why analyzer isn't created while Index is successfuly created? Surrounding question is why it pops up "failed to load elasticsearch nodes" since the data was loaded correctly?

Jim C
  • 3,957
  • 25
  • 85
  • 162
  • 1
    which versions of Spring Boot, Spring Data Elasticsearch, Elasticsearch client and Elasticsearch Server are you using? – P.J.Meisch Apr 11 '20 at 06:23
  • @P.J.Meisch I try follow https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#preface.versions. So my versions are: Spring Data Elasticsearch: 3.2.6. Elastic Server started by Docker with docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.4. I am not sure what you mean by Elasticsearch client. I guess you mean the library downloaded from maven repository by Spring Data. So it is elasticsearch-6.8.7.jar and elasticsearch-core-6.8.7.jar. I am not using ElasticSearch v 7+ because the compability grid Spring Docs – Jim C Apr 11 '20 at 13:58

1 Answers1

1

in your descriptions of the files you write:

elastic-analyzer.json from resources/data/es-config

but in your @Setting annotation the data part from that path is missing. You should change that to:

@Setting(settingPath = "data/es-config/elastic-analyzer.json")

or move the json file one directory up.

Because of this wrong path, the settings weren't written to the index on creation and therefore the analyzer is not available - which then leads to the error message you see.

Another thing: When loading your data, instead of calling save with every entity object, you should collect them in a list and do a batch insert using saveAll, that's much more performant.

P.J.Meisch
  • 18,013
  • 6
  • 50
  • 66
  • Meisch thank you. You certainly answered my main question. Kindly, could you share with me how would you achieve my final goal "search-as-user-type" not only from beggining of word but also from middle? Do you think I am in right direction with "type": "edge_ngram"? Would you favour other type if you had to search on very simple documents ("id", "Name") ? The index volume expected is 10 million names. So if I type "an" should bring Anna and Daniel also. – Jim C Apr 11 '20 at 14:09
  • Meisch, if you don't mind, I created another question because the analyzer only works when created from Spring-data but it seems to be ignored during search when created straigh from curl/postman. Are you aware about any "extra step" done by Spring-data or some trick how to enable Spring to print on console the commands posted to Elasticsearch? https://stackoverflow.com/questions/61158504/elasticsearch-analizer-working-when-created-throw-springdata-but-failing-when-cr – Jim C Apr 11 '20 at 14:41
  • regard enable Spring-data-elasticsearch logs I could enable to print queries but not the posts during Springboot initialization. I added logging: level: org: springframework: data: elasticsearch: core: DEBUG It wiould be usefull if there is someway to print in console the commands sent from spring-data-elasticsearch during springboot initialization since I am using springframework.boot.CommandLineRunner for creating the Index, Analyser and load test data – Jim C Apr 12 '20 at 14:20
  • 1
    for logging see https://docs.spring.io/spring-data/elasticsearch/docs/3.2.6.RELEASE/reference/html/#elasticsearch.clients.logging – P.J.Meisch Apr 12 '20 at 16:23