With Elasticsearch painless scripting you can use code like the following:
POST ip_search/doc/_search
{
"query": {
"match_all": {}
},
"script_fields": {
"originating_ip_calc": {
"script": {
"source": """
String ip_addr = params['_source']['originating_ip'];
def ip_chars = ip_addr.toCharArray();
int chars_len = ip_chars.length;
long result = 0;
int cur_power = 0;
int last_dot = chars_len;
for(int i = chars_len -1; i>=-1; i--) {
if (i == -1 || ip_chars[i] == (char) '.' ){
result += (Integer.parseInt(ip_addr.substring(i+ 1, last_dot)) * Math.pow(256, cur_power));
last_dot = i;
cur_power += 1;
}
}
return result
""",
"lang": "painless"
}
}
},
"_source": ["originating_ip"]
}
(Note that I used Kibana console to send the request to ES, it does some escaping to make this a valid JSON before sending.)
This will give a response like this:
"hits": [
{
"_index": "ip_search",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"originating_ip": "10.0.0.1"
},
"fields": {
"originating_ip_calc": [
167772161
]
}
},
{
"_index": "ip_search",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"originating_ip": "1.2.3.4"
},
"fields": {
"originating_ip_calc": [
16909060
]
}
}
]
But why does it have to be this way?
Why does the approach with .split
not work?
If you send the code from the question to ES it replies with an error like this:
"script": "String[] ipAddressInArray = \"1.2.3.4\".split(\"\\\\.\");\n\nlong result = 0;\nfor (int i = 0; i < ipAddressInArray.length; i++) {\n int power = 3 - i;\n int ip = Integer.parseInt(ipAddressInArray[i]);\n long longIP = (ip * Math.pow(256, power)).toLong();\n result = result + longIP;\n}\nreturn result;",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unknown call [split] with [1] arguments on type [String]."
This is mainly due to the fact that Java's String.split()
is not considered safe to use (because it creates regex Pattern implicitly). They suggest to use Pattern#split but to do so you should have regexes enabled in your index.
By default, they are disabled:
"script": "String[] ipAddressInArray = /\\./.split(\"1.2.3.4\");...
"lang": "painless",
"caused_by": {
"type": "illegal_state_exception",
"reason": "Regexes are disabled. Set [script.painless.regex.enabled] to [true] in elasticsearch.yaml to allow them. Be careful though, regexes break out of Painless's protection against deep recursion and long loops."
Why do we have to do an explicit cast (char) '.'
?
So, we have to split the string on dots manually. The straightforward approach is to compare each char of the string with '.'
(which in Java means char
literal, not String
).
But for painless
it means String
. So we have to make an explicit cast to char
(because we are iterating over an array of chars).
Why do we have to work with char array directly?
Because apparently painless
does not allow .length
method of String
as well:
"reason": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"\"1.2.3.4\".length",
" ^---- HERE"
],
"script": "\"1.2.3.4\".length",
"lang": "painless",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Unknown field [length] for type [String]."
}
}
So why is it called painless
?
Although I can't find any historical note on the naming after quick googling, from the documentation page and some experience (like above in this answer) I can infer that it is designed to be painless to use in production.
It's predecessor, Groovy, was a ticking bomb due to resources usage and security vulnerabilities. So Elasticsearch team created a very limited subset of Java/Groovy scripting which would have predictable performance and would not contain those security vulnerabilities, and called it painless
.
If there is anything true about painless
scripting language, is that it is limited and sandboxed.