In my example code I am using the php client library, but it should be understood by anyone familiar with elasticsearch.
I'm using elasticsearch to create an index where each document contains an array of nGram indexed authors. Initially, the document will have a single author, but as time progresses, more authors will be appended to the array. Ideally, a search could be executed by an author's name, and if any of the authors in the array get matched, the document will be found.
I have been trying to use the documentation here for appending to the array and here for using the array type - but I have not had success getting this working.
First, I want to create an index for documents, with a title, array of authors, and an array of comments.
$client = new Client();
$params = [
'index' => 'document',
'body' => [
'settings' => [
// Simple settings for now, single shard
'number_of_shards' => 1,
'number_of_replicas' => 0,
'analysis' => [
'filter' => [
'shingle' => [
'type' => 'shingle'
]
],
'analyzer' => [
'my_ngram_analyzer' => [
'tokenizer' => 'my_ngram_tokenizer',
'filter' => 'lowercase',
]
],
// Allow searching for partial names with nGram
'tokenizer' => [
'my_ngram_tokenizer' => [
'type' => 'nGram',
'min_gram' => 1,
'max_gram' => 15,
'token_chars' => ['letter', 'digit']
]
]
]
],
'mappings' => [
'_default_' => [
'properties' => [
'document_id' => [
'type' => 'string',
'index' => 'not_analyzed',
],
// The name, email, or other info related to the person
'title' => [
'type' => 'string',
'analyzer' => 'my_ngram_analyzer',
'term_vector' => 'yes',
'copy_to' => 'combined'
],
'authors' => [
'type' => 'list',
'analyzer' => 'my_ngram_analyzer',
'term_vector' => 'yes',
'copy_to' => 'combined'
],
'comments' => [
'type' => 'list',
'analyzer' => 'my_ngram_analyzer',
'term_vector' => 'yes',
'copy_to' => 'combined'
],
]
],
]
]
];
// Create index `person` with ngram indexing
$client->indices()->create($params);
Off the get go, I can't even create the index due to this error:
{"error":"MapperParsingException[mapping [_default_]]; nested: MapperParsingException[No handler for type [list] declared on field [authors]]; ","status":400}
HAD this gone successfully though, I would plan to create an index, starting with empty arrays for authors and title, something like this:
$client = new Client();
$params = array();
$params['body'] = array('document_id' => 'id_here', 'title' => 'my_title', 'authors' => [], 'comments' => []);
$params['index'] = 'document';
$params['type'] = 'example_type';
$params['id'] = 'id_here';
$ret = $client->index($params);
return $ret;
This seems like it should work if I had the desired index to add this structure of information to, but what concerns me would be appending something to the array using update
. For example,
$client = new Client();
$params = array();
//$params['body'] = array('person_id' => $person_id, 'emails' => [$email]);
$params['index'] = 'document';
$params['type'] = 'example_type';
$params['id'] = 'id_here';
$params['script'] = 'NO IDEA WHAT THIS SCRIPT SHOULD BE TO APPEND TO THE ARRAY';
$ret = $client->update($params);
return $ret;
}
I am not sure how I would go about actually appending a thing to the array and making sure it's indexed.
Finally, another thing that confuses me is how I could search based on any author in the array. Ideally I could do something like this:
But I'm not 100% whether it will work. Maybe there is something fundemental about elasticsearch that I am not understanding. I am completely new to so any resources that will get me to a point where these little details don't hang me up would be appreciated.
Also, any direct advice on how to use elasticsearch to solve these problems would be appreciated.
Sorry for the big wall of text, to recap, I am looking for advice on how to
- Create an index that supports nGram analysis on all elements of an array
- Updating that index to append to the array
- Searching for the now-updated index.
Thanks for any help
EDIT: thanks to @astax, I am now able to create the index and append to the value as a string. HOWEVER, there are two problems with this:
- the array is stored as a string value, so a script like
$params['script'] = 'ctx._source.authors += [\'hello\']';
actually appends a STRING with []
rather than an array containing a value.
the value inputted does not appear to be ngram analyzed, so a search like this:
$client = new Client(); $searchParams['index'] = 'document'; $searchParams['type'] = 'example_type'; $searchParams['body']['query']['match']['_all'] = 'hello'; $queryResponse = $client->search($searchParams); print_r($queryResponse); // SUCCESS
will find the new value but a search like this:
$client = new Client();
$searchParams['index'] = 'document';
$searchParams['type'] = 'example_type';
$searchParams['body']['query']['match']['_all'] = 'hel';
$queryResponse = $client->search($searchParams);
print_r($queryResponse); // NO RESULTS
does not