1

I am trying to reindex data and do some calculations based on fields in the source document. I have used ingest pipelines to enrich the document with geo_point and want to calculate some other values as well.

The issue that I have is that the source data throws an error saying it can't be casted. Here the details:

Raw (from ML csv input):

"_source": {
"Time": "18.06.2017 17:37:32",
"Weight (kg)": 286000,
"People": 2,
"Seats": "2"}

However, the import done with ML clearly stated the following:

{
  "convert": {
    "field": "Seats",
    "type": "long",
    "ignore_missing": true
  }
},{
  "convert": {
    "field": "People",
    "type": "long",
    "ignore_missing": true
  }
}

The incoming raw data is consistent in a way that all values are strictly numbers, no quotes etc (the first 3 are the weight, the seats and the people:

66990;189;172;0;0;0;0;0

For clarification also the mapping/mapping template for the index later on which also shows the correct type:

"People": {
                "type": "long"
              },
"Seats": {
                "type": "long"
              },

Now, when I use a Kibana scripted field I can calculate as follows:

if (doc['Seats'].value == 0)

{ return 0 } else

{
long utilization = (doc["People"].value * 100)/doc["Seats"].value;
return utilization
}

everything works fine and I get a calculated utilization.

When I try to do th same with scripts in the ingest pipeline like this:

"caused_by" : {
          "type" : "class_cast_exception",
          "reason" : "cannot explicitly cast float [java.lang.String] to byte"
 }

The code I use is as follows:

"script": {
        "if": "!(ctx.Seats=0) && !(ctx.Seats==null)",
        "lang": "painless",
        "source": "ctx.utilization = (float)ctx.People*100.0/(float)ctx.Seats"
}

My questions are:

  1. why does the ML ingest behave differently (the raw data from the csv is absolutely the same, only ints)
  2. what can I do in the ingest pipeline to get it done
  3. is the kibana index pattern way as performant as the ingest pipeline, or should I stick with the ingest pipeline in terms of load etc.

Thanks for your help and hints.

Chibisuke

  • Can you update your question with the mapping of your index? Especially the `Seats` and `People` fields? – Val Nov 16 '20 at 16:01
  • Hi Val, thanks a lot for the answer, I amended the necessary. However, I still don't get why the initial ML ingest pipeline fails to properly import the values (in terms of proper casting). Please also see my amendment with regards to the raw data. – Chibisuketyan Nov 17 '20 at 06:44

1 Answers1

2

In the ingest pipeline, ctx.Seats will still be a string because it's a string in the source document. You either need to parse it in your script or convert it just before the script.

Option without conversion and simply parsing the value in the script:

"script": {
    "if": "!(ctx.Seats=="0") && !(ctx.Seats==null)",
    "lang": "painless",
    "source": "ctx.utilization = 100.0 * ctx.People / Float.parseFloat(ctx.Seats)"
}

Option with conversion before running the script:

{
  "convert" : {
    "field" : "Seats",
    "type": "float",
    "ignore_missing": true
  }
},
{
  "script": {
    "if": "!(ctx.Seats==0) && !(ctx.Seats==null)",
    "lang": "painless",
    "source": "ctx.utilization = 100.0 * ctx.People / ctx.Seats"
  }
}
Val
  • 207,596
  • 13
  • 358
  • 360
  • So, the ML conversion was correct and hence the mapping (long values) is also correct. What you need to realize is that when the document starts flowing through the ingest pipeline the values are still in raw form (i.e. ctx.Seats is still a string) because the document hasn't yet landed into the index (and hence the mapping hasn't kicked in yet). The `convert` processor in the ingest pipeline does exactly the same job as the ML one did. It would help if you could share your ML ingest pipeline – Val Nov 17 '20 at 07:19
  • Hi Val, super fast! Thanks. I still don't get it. The flow is as follows: I upload the raw data with ML (with the ML ingest pipeline created by the Kibana UI as mentioned at the beginning of the post). The indexed data ends up in an index like indexname-2017 for the year. I then apply another ingest pipeline with geo lookup etc, including the calculation. So my understanding is that by this time the index holds already the correct format. Or is it with the way that I do it (e.g. ctx.Field)? Thanks again for your patience!! – Chibisuketyan Nov 17 '20 at 08:07
  • I've used your sample CSV data above and reproduce the whole process (ML, data visualizer, import, etc). In my index, I can see that all the data fields are long and none are double quoted. Not sure why you're seeing `"Seats": "2"` in your document. It would help if you could provide a complete reproduction of how you arrive at the error. I'm sure it'd be easier to spot the issue – Val Nov 17 '20 at 08:12
  • Thats really weird, anyhow, I will not spend further time on investigation now as I understood the things that I have done wrong in the script and your proposed solution works just perfectly fine. Thank you very much Val for your time! – Chibisuketyan Nov 17 '20 at 08:20
  • Awesome, this is great to hear! Glad I could help! – Val Nov 17 '20 at 08:21