ElasticSearch enables us to filter a set of documents by regex on any given field, and also to group the resulting documents by the terms in a given (same or different field, using "bucket aggregations". For example, on an index that contains a "Url" field and a "UserAgent" field (some kind of web server log), the following will return the top document counts for terms found in the UserAgent field.
{
query: { filtered: { filter: { regexp: { Url : ".*interestingpage.*" } } } },
size: 0,
aggs: { myaggregation: { terms: { field: "UserAgent" } } }
}
What I'd like to do is use the power of the regexp filter (which operates on the whole field, not just terms within a field) to manually define my aggregation buckets, so that I can relatively reliably split my documents/counts/hits by "user agent type" data, rather than the arbitrary terms parsed by elastic search in the field.
Basically, I am looking for the equivalent of a CASE statement in a GROUP BY, in SQL terms. The SQL query that would express my intent would be something like:
SELECT Bucket, Count(*)
FROM (
SELECT CASE
WHEN UserAgent LIKE '%android%' OR UserAgent LIKE '%ipad%' OR UserAgent LIKE '%iphone%' OR UserAgent LIKE '%mobile%' THEN 'Mobile'
WHEN UserAgent LIKE '%msie 7.0%' then 'IE7'
WHEN UserAgent LIKE '%msie 8.0%' then 'IE8'
WHEN UserAgent LIKE '%firefox%' then 'FireFox'
ELSE 'OTHER'
END Bucket
FROM pagedata
WHERE Url LIKE '%interestingpage%'
) Buckets
GROUP BY Bucket
Can this be done in an ElasticSearch query?