elasticsearch: extract number from a field

Question

I'm using elasticsearch and kibana for storing my logs. Now what I want is to extract a number from a field and store it a new field.

So for instance, having this:

accountExist execution time: 1046 ms

I would like to extract the number (1046) and see it in a new field in kibana.

Is it possible? how? Thanks for the help

We solved this using logstash scripts. With grok you can do a lot of things. — Pirulino, Feb 10 '20 at 11:20

score 10 · Answer 1 · edited May 23 '17 at 11:47

10

You'll need to do this before/during indexing.

Within Elasticsearch, you can get what you need during indexing:

Define a new analyzer using the Pattern Analyzer to wrap a regular expression (for your purposes, to capture consecutive digits in the string - good answer on this topic).
Create your new numeric field in the mapping to hold the extracted times.
Use copy_to to copy the log message from the input field to the new numeric field from (2) where the new analyzer will parse it.

The Analyze API can be helpful for testing purposes.

edited May 23 '17 at 11:47

Community

1
1

answered Oct 07 '15 at 18:26

Peter Dixon-Moses

3,169
14
18

As mentioned, this must be done during indexing, so to add this for existing data, you'll need to re-index. Decent strategy for that here: http://stackoverflow.com/a/17446500/947986 – Dusty Oct 07 '15 at 19:09
1

Frustrating that you have to go to such lengths do do something so trival. Is there some reason why it's hard for kibana to pluck values out text for plotting etc? – Ben Hyde Aug 04 '16 at 14:27
Say you wanted to comb through the past week of logs (10MM records) looking for requests that took 1+ second. In a database, you'd do a table-scan of 10MM rows, processing a regex 10MM times followed by a CAST and a comparison.... That would take a long time! If you loaded the execution time into its own (indexed) field from the start, you'd be able to find the records in a few hops down a btree (microseconds?). Kibana relies on what Elasticsearch is good at: aggregation, not heavy-lifting. – Peter Dixon-Moses Aug 07 '16 at 02:55
I am afraid, this solution won't work. See https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html `It is the field value which is copied, not the terms (which result from the analysis process).` I tested it and indeed this will not work as the value `before` analysis is copied to the new field instead of the terms resulting from the analyzer. – Datageek Nov 07 '17 at 10:13
Your argument is correct. The analysis (which pulled out the number) would have to be performed on the destination field. But in hindsight, even if you extract the numeric part, you'd be doing "text" analysis on the destination field so the ultimate "term" tokenized would be of type "text" and not numeric as intended. In the past, I've always handled this sort of text parsing before sending the record to be indexed. – Peter Dixon-Moses Nov 10 '17 at 02:23

score 4 · Answer 2 · answered May 19 '17 at 18:44

While not performant, if you must avoid reindexing, you could use scripted fields in kibana.

Introduction here: https://www.elastic.co/blog/using-painless-kibana-scripted-fields

enable painless regex support by putting the following in your elasticsearch.yaml:

script.painless.regex.enabled: true
restart elasticsearch
create a new scripted field in Kibana through Management -> Index Patterns -> Scripted Fields
select painless as the language and number as the type
create the actual script, for example:

def logMsg = params['_source']['log_message'];
if(logMsg == null) {
 return -10000;
}
def m = /.*accountExist execution time: ([0-9]+) ms.*$/.matcher(params['_source']['log_message']);
if ( m.matches() ) {
   return Integer.parseInt(m.group(1))
} else {
   return -10000
}

you must reload the website completely for the new fields to be executed, simply re-doing a search on an open discover site will not pick up the new fields. (This almost made me quit trying to get this working -.-)
use the script in discover or visualizations

While I do understand, that it is not performant to script fields for millions of log entries, my usecase is a very specific log entry, that is logged 10 times a day in total and I only use the resulting fields to create a visualization or in analysis where I reduce the candidates through regular queries in advance.

Would be interesing if it is possible to have those fields only be calculated in situations where you need them (or they make sense & are computable to begin with; i.e. to make the "return -1000" unnecessary). Currently they will be applied and show up for every log entry.
You can generate scripted fields inside of queries like this: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html but that seems a bit too much of burried under the hood, to maintain easily :/

elasticsearch: extract number from a field

2 Answers2