Convert IP (string) into long in elasticsearch/kibana scripted fields

Question

I have a field in a doc that is a string representation of ipv4 ("1.2.3.4"), the name of the field is "originating_ip". I'm trying to use the scripted fields using the painless language in order to add a new field (originating_ip_calc) to have the int (long) representation of said IPv4.

The following script works in groovy (and from what I understand this should basically work almost the same), but it seems like almost is not in this specific case.

String[] ipAddressInArray = "1.2.3.4".split("\\.");

long result = 0;
for (int i = 0; i < ipAddressInArray.length; i++) {
    int power = 3 - i;
    int ip = Integer.parseInt(ipAddressInArray[i]);
    long longIP = (ip * Math.pow(256, power)).toLong();
    result = result + longIP;
}
return result;

I also looking in this question and as you can see from the code above it is based on one of the answers there.

Also tried to work with InetAddress but no luck.

You seem to be aware of IPv6; a pitty you did not ask for more. `long longIP = ip << power*8L; result |= longIP;` (bit shift left, long result for 8L, bitwise or). — Joop Eggen, Aug 12 '20 at 10:42

Nikolay Vasiliev · Accepted Answer · 2020-08-12T10:34:38.290

With Elasticsearch painless scripting you can use code like the following:

POST ip_search/doc/_search
{
  "query": {
    "match_all": {}
  },
  "script_fields": {
    "originating_ip_calc": {
      "script": {
        "source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
  if (i == -1 || ip_chars[i] == (char) '.' ){
    result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
    last_dot = i;
    cur_power += 1;
  }
}         
return result
""",
        "lang": "painless"
      }
    }
  },
  "_source": ["originating_ip"]
}

(Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.)

This will give a response like this:

"hits": [
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "2",
    "_score": 1,
    "_source": {
      "originating_ip": "10.0.0.1"
    },
    "fields": {
      "originating_ip_calc": [
        167772161
      ]
    }
  },
  {
    "_index": "ip_search",
    "_type": "doc",
    "_id": "1",
    "_score": 1,
    "_source": {
      "originating_ip": "1.2.3.4"
    },
    "fields": {
      "originating_ip_calc": [
        16909060
      ]
    }
  }
]

But why does it have to be this way?

Why does the approach with `.split` not work?

If you send the code from the question to ES it replies with an error like this:

      "script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n    int power = 3 - i;\n    int ip = Integer.parseInt(ipAddressInArray[i]);\n    long longIP = (ip * Math.pow(256, power)).toLong();\n    result = result + longIP;\n}\nreturn result;",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown call [split] with [1] arguments on type [String]."

This is mainly due to the fact that Java's String.split() is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index.

By default, they are disabled:

      "script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
      "lang": "painless",
      "caused_by": {
        "type": "illegal_state_exception",
        "reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."

Why do we have to do an explicit cast `(char) '.'`?

So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with '.' (which in Java means char literal, not String).

But for painless it means String. So we have to make an explicit cast to char (because we are iterating over an array of chars).

Why do we have to work with char array directly?

Because apparently painless does not allow .length method of String as well:

    "reason": {
      "type": "script_exception",
      "reason": "compile error",
      "script_stack": [
        "\"1.2.3.4\".length",
        "         ^---- HERE"
      ],
      "script": "\"1.2.3.4\".length",
      "lang": "painless",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown field [length] for type [String]."
      }
    }

So why is it called `painless` ?

Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production.

It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it painless.

If there is anything true about painless scripting language, is that it is limited and sandboxed.

Amazing answer! Thanks for that. For some reasons something still doesn't work. I'll try to understand what/why and approve the answer after that :) Thanks again! — Dekel, Nov 05 '18 at 09:46
Needed to use `doc['originating_ip.keyword'].value` in order to get the string value of the originating_ip field. Thanks! — Dekel, Nov 05 '18 at 10:00

Convert IP (string) into long in elasticsearch/kibana scripted fields

1 Answers1

Why does the approach with .split not work?

Why do we have to do an explicit cast (char) '.'?

Why do we have to work with char array directly?

So why is it called painless ?

Why does the approach with `.split` not work?

Why do we have to do an explicit cast `(char) '.'`?

So why is it called `painless` ?