I am trying to reindex data and do some calculations based on fields in the source document. I have used ingest pipelines to enrich the document with geo_point and want to calculate some other values as well.
The issue that I have is that the source data throws an error saying it can't be casted. Here the details:
Raw (from ML csv input):
"_source": {
"Time": "18.06.2017 17:37:32",
"Weight (kg)": 286000,
"People": 2,
"Seats": "2"}
However, the import done with ML clearly stated the following:
{
"convert": {
"field": "Seats",
"type": "long",
"ignore_missing": true
}
},{
"convert": {
"field": "People",
"type": "long",
"ignore_missing": true
}
}
The incoming raw data is consistent in a way that all values are strictly numbers, no quotes etc (the first 3 are the weight, the seats and the people:
66990;189;172;0;0;0;0;0
For clarification also the mapping/mapping template for the index later on which also shows the correct type:
"People": {
"type": "long"
},
"Seats": {
"type": "long"
},
Now, when I use a Kibana scripted field I can calculate as follows:
if (doc['Seats'].value == 0)
{ return 0 } else
{
long utilization = (doc["People"].value * 100)/doc["Seats"].value;
return utilization
}
everything works fine and I get a calculated utilization.
When I try to do th same with scripts in the ingest pipeline like this:
"caused_by" : {
"type" : "class_cast_exception",
"reason" : "cannot explicitly cast float [java.lang.String] to byte"
}
The code I use is as follows:
"script": {
"if": "!(ctx.Seats=0) && !(ctx.Seats==null)",
"lang": "painless",
"source": "ctx.utilization = (float)ctx.People*100.0/(float)ctx.Seats"
}
My questions are:
- why does the ML ingest behave differently (the raw data from the csv is absolutely the same, only ints)
- what can I do in the ingest pipeline to get it done
- is the kibana index pattern way as performant as the ingest pipeline, or should I stick with the ingest pipeline in terms of load etc.
Thanks for your help and hints.
Chibisuke