3

I have a list of domain objects that relate to web access records. These domain objects can stretch into the thousands in number.

I don't have the resources or requirement to store them in a database in raw format, so instead I want to precompute aggregations and put the aggregated data in a database.

I need to aggregate the total bytes transferred in 5 minute windows, like the following SQL query

select 
  round(request_timestamp, '5') as window, --round timestamp to the nearest 5 minute
  cdn, 
  isp, 
  http_result_code, 
  transaction_time, 
  sum(bytes_transferred)
from web_records
group by 
    round(request_timestamp, '5'), 
    cdn, 
    isp, 
    http_result_code, 
    transaction_time

In Java 8 my first current stab looks like this, I am aware this solution is similar to this response in Group by multiple field names in java 8

Map<Date, Map<String, Map<String, Map<String, Map<String, Integer>>>>>>> aggregatedData =
webRecords
    .stream()
    .collect(Collectors.groupingBy(WebRecord::getFiveMinuteWindow,
               Collectors.groupingBy(WebRecord::getCdn,
                 Collectors.groupingBy(WebRecord::getIsp,
                   Collectors.groupingBy(WebRecord::getResultCode,
                       Collectors.groupingBy(WebRecord::getTxnTime,
                         Collectors.reducing(0,
                                             WebRecord::getReqBytes(),
                                             Integer::sum)))))));

This works, but it's ugly, all those nested maps are a nightmare! To "flatten" or "unroll" the map out into rows I have to do this

for (Date window : aggregatedData.keySet()) {
  for (String cdn : aggregatedData.get(window).keySet()) {
    for (String isp : aggregatedData.get(window).get(cdn).keySet()) {
      for (String resultCode : aggregatedData.get(window).get(cdn).get(isp).keySet()) {
        for (String txnTime : aggregatedData.get(window).get(cdn).get(isp).get(resultCode).keySet()) {

           Integer bytesTransferred = aggregatedData.get(window).get(cdn).get(distId).get(isp).get(resultCode).get(txnTime);
           AggregatedRow row = new AggregatedRow(window, cdn, distId...

As you can see this is pretty messy and difficult to maintain.

Anyone have any ideas of a better way to do this? Any help would be greatly appreciated.

I'm wondering if there is a nicer way to unroll the nested maps, or if there is a library that allows you to do a GROUP BY on a collection.

Community
  • 1
  • 1
djhworld
  • 6,726
  • 4
  • 30
  • 44
  • 1
    Duplicate: http://stackoverflow.com/q/28342814 – Tunaki Sep 11 '15 at 20:23
  • 1
    Do you need to group by each of those fields individually, or can you group by those fields as a unit all at once? Also, you can use `Collectors.summingInt(WebRecord::getReqBytes)` instead of the `reducing` line. – Louis Wasserman Sep 11 '15 at 20:34
  • 1
    I am aware of the other question, in fact, my solution is pretty much the same as the top answer. However, I was hoping there would be a better way than using nested maps – djhworld Sep 11 '15 at 20:42
  • 3
    The top answer also provides a second option: "A second option is to define a class that represents the grouping." – Tunaki Sep 11 '15 at 20:42
  • Ah msut have missed that, that's exactly what I wanted - thanks, happy to close – djhworld Sep 11 '15 at 21:00

1 Answers1

11

You should create the custom key for your map. The simplest way is to use Arrays.asList:

Function<WebRecord, List<Object>> keyExtractor = wr ->
    Arrays.<Object>asList(wr.getFiveMinuteWindow(), wr.getCdn(), wr.getIsp(),
             wr.getResultCode(), wr.getTxnTime());
Map<List<Object>, Integer> aggregatedData = webRecords.stream().collect(
      Collectors.groupingBy(keyExtractor, Collectors.summingInt(WebRecord::getReqBytes)));

In this case the keys are lists of 5 elements in fixed order. Not quite object-oriented, but simple. Alternatively you can define your own type which represents the custom key and create proper hashCode/equals implementations.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334