5

I have incoming objects with a flat de-normalized structure which I instantiated from a JDBC resultset. The incoming objects mirror the resultset, there's loads of repeated data so I want to convert the data into a list of parent objects with nested child collections, i.e. an object graph, or normalized list.

The incoming object's class looks like this:

class IncomingFlatItem {
    String clientCode;
    String clientName;
    String emailAddress;
    boolean emailHtml;
    String reportCode;
    String reportLanguage;
}

So the incoming data contains multiple objects for each client, which I'd like to aggregate into one client object, which contains a list of email address objects for the client, and a list of report objects.

So the Client object would look like this:

class Client {
    String clientCode;
    String clientName;
    Set<EmailAddress> emailAddresses;
    Set<Report> reports;
}

Strangely I can't find an existing answer for this. I am looking at nesting streams or chaining streams but I'd like to find the most elegant approach and I definitely want to avoid a for-loop.

Adam
  • 5,215
  • 5
  • 51
  • 90
  • 1
    why are you so definite about the for-loop approach? ;) – Andrew Tobilko Jun 14 '19 at 12:49
  • I don't need to ask how to do that! I want the functional approach learning experience :) – Adam Jun 14 '19 at 12:51
  • I have a similar scenario where I have another level like set of Subreports object within the Report object so in that case how can I use the Stream API ? – JSF Nov 07 '20 at 16:00

5 Answers5

1

One thing you can do is use constructor parameters and a fluent API to your advantage. Thinking "nested" flows and the stream API (with dynamic data) can get complex very quickly.

This just uses a fluent API to simplify things (you can use a proper builder pattern instead)

class Client {
    String clientCode;
    String clientName;
    Set<EmailAddress> emailAddresses = new HashSet<>();
    Set<Report> reports = new HashSet<>();

    public Client(String clientCode, String clientName) {
        super();
        this.clientCode = clientCode;
        this.clientName = clientName;
    }

    public Client emailAddresses(String address, boolean html) {
        this.emailAddresses = 
             Collections.singleton(new EmailAddress(address, html));
        return this;
    }

    public Client reports(String... reports) {
        this.reports = Arrays.stream(reports)
                        .map(Report::new)
                        .collect(Collectors.toSet());
        return this;
    }

    public Client merge(Client other) {
        this.emailAddresses.addAll(other.emailAddresses);
        this.reports.addAll(other.reports);

        if (null == this.clientName)
            this.clientName = other.clientName;
        if (null == this.clientCode)
            this.clientCode = other.clientCode;

        return this;
    }
}

class EmailAddress {
    public EmailAddress(String e, boolean html) {

    }
}

class Report {
    public Report(String r) {

    }
}

And...

Collection<Client> clients = incomingFlatItemsCollection.stream()
        .map(flatItem -> new Client(flatItem.clientCode, flatItem.clientName)
                          .emailAddresses(flatItem.emailAddress, flatItem.emailHtml)
                          .reports(flatItem.reportCode, flatItem.reportLanguage))
        .collect(Collectors.groupingBy(Client::getClientCode,
                Collectors.reducing(new Client(null, null), Client::merge)))
        .values();

Or you can also just use mapping functions that convert IncomingFlatItem objects to Client.

ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • 1
    But this doesn't get rid of duplicate clients, does it? Duplicate client objects will appear when a client has multiple emails, or reports, each with only one object in its email or report collection. I have a bad feeling we are not talking about the same object structure. I updated the question. – Adam Jun 14 '19 at 14:47
  • Ah... Now I see. Sorry I missed the fact. Will update now. – ernest_k Jun 14 '19 at 14:58
  • @Adam I've made an edit. This should remove duplicates (by clientCode) – ernest_k Jun 14 '19 at 15:09
1

You can do something on the lines of using mapping function to convert List<IncomingFlatItem> to Set<Reports/EmailAddress> as:

Function<List<IncomingFlatItem>, Set<EmailAddress>> inferEmailAddress =
        incomingFlatItems -> incomingFlatItems.stream()
                .map(obj -> new EmailAddress(obj.getEmailAddress(), 
                                             obj.isEmailHtml()))
                .collect(Collectors.toSet());

Function<List<IncomingFlatItem>, Set<Report>> inferReports =
        incomingFlatItems -> incomingFlatItems.stream()
                .map(obj -> new Report(obj.getReportCode(), 
                                       obj.getReportLanguage()))
                .collect(Collectors.toSet());

and further using groupingBy and mapping the entries to List<Client> as:

List<Client> transformIntoGroupedNormalisedContent(
                  List<IncomingFlatItem> incomingFlatItemList) {
    return incomingFlatItemList.stream()
            .collect(Collectors.groupingBy(inc ->
                    Arrays.asList(inc.getClientCode(), inc.getClientName())))
            .entrySet()
            .stream()
            .map(e -> new Client(e.getKey().get(0), 
                                 e.getKey().get(1),
                                 inferEmailAddress.apply(e.getValue()), 
                                 inferReports.apply(e.getValue())))
            .collect(Collectors.toList());
}
Adam
  • 5,215
  • 5
  • 51
  • 90
Naman
  • 27,789
  • 26
  • 218
  • 353
  • Or possibly `new AbstractMap.SimpleEntry<>(inc.getClientCode(), inc.getClientName())` in grouping to avoid accessing elements with hardcoded index. – Naman Jun 14 '19 at 13:19
1

You can use this:

List<Client> clients = items.stream()
        .collect(Collectors.groupingBy(i -> Arrays.asList(i.getClientCode(), i.getClientName())))
        .entrySet().stream()
        .map(e -> new Client(e.getKey().get(0), e.getKey().get(1),
                e.getValue().stream().map(i -> new EmailAddress(i.getEmailAddress(), i.isEmailHtml())).collect(Collectors.toSet()),
                e.getValue().stream().map(i -> new Report(i.getReportCode(), i.getReportLanguage())).collect(Collectors.toSet())))
        .collect(Collectors.toList());

At the beginning you group your items by clientCode and clientName. After that you map the results to your Client object.

Make sure the .equals() and hashCode() methods are implemented for EmailAddress and Report to ensure they are distinct in the set.

Samuel Philipp
  • 10,631
  • 12
  • 36
  • 56
1

Thanks to all the answerers who mentioned Collectors.groupingBy(). This was key to setting up a stream where I could use reduce(). I had erroneously believed I should be able to use reduce on its own to solve the problem, without groupingBy.

Thanks also to the suggestion to create a fluent API. I added IncomingFlatItem.getEmailAddress() and IncomingFlatItem.getReport() to fluently grab the domain objects from IncomingFlatItem - and also a method to convert the whole flat item to a proper domain object with its email and report nested already:

public Client getClient() {
    Client client = new Client();
    client.setClientCode(clientCode);
    client.setClientName(clientName);
    client.setEmailAddresses(new ArrayList());
    client.getEmailAddresses().add(this.getEmailAddress());
    client.setReports(new ArrayList<>());
    client.getReports().add(this.getReport());
    return client;
}

I also created business ID-based .equals() and .hashCode() methods on Client, EmailAddress and Report as recommended by @SamuelPhilip

Lastly for the domain objects, I created .addReport(Report r) and .addEmail(EmailAddress e) on my Client class, which would add the child object to Client if not already present. I ditched the Set collection type for List because the domain model standard is List and Sets would have meant lots of conversions to Lists.

So with that, the stream code and lambdas look succinct.

There are 3 steps:

  1. map IncomingFlatItems to Clients
  2. group the Clients into a map by client (relying heavily on Client.equals())
  3. reduce each group to one Client

So this is the functional algorithm:

List<Client> unflatten(List<IncomingFlatItem> flatItems) {
    return flatItems.parallelStream()
            .map(IncomingFlatItem::getClient)
            .collect(Collectors.groupingByConcurrent(client -> client))
            .entrySet().parallelStream()
            .map(kvp -> kvp.getValue()
                    .stream()
                    .reduce(new Client(), 
                            (client1, client2) -> {
                                    client1.getReports()
                                            .forEach(client2::addReport);
                                    client1.getEmailAddresses()
                                            .forEach(client2::addEmail);
                                    return client2;
                    }))
            .collect(Collectors.toList());
}

I took a long time due to going off on a tangent before I really understood reduce - I found a solution which passed my tests while using .stream() but totally failed with .parallelStream() hence its usage here. I had to use CopyOnWriteArrayList as well otherwise it would fall over randomly with ConcurrentModificationExceptions

Adam
  • 5,215
  • 5
  • 51
  • 90
  • Using `Stream.reduce()` is possible the worst thing you could do, because it's violating the contract of `reduce()`. See [Reduction](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Reduction) and [Mutable Reduction](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#MutableReduction) or [Java 8 Streams - collect vs reduce](https://stackoverflow.com/q/22577197/9662601) for more information. Use `Stream.collect()` instead. – Samuel Philipp Jun 24 '19 at 21:14
  • Hi Samuel, thanks for the input but I think you're getting hung up on semantics. The stream reduces the flattened collection of multiple instances of each object to just one instance of the object, per group, where each group has multiple instances of an object, all having the same ID. You reference an SO question which discusses multiple different streaming issues - which one do you think is relevant? – Adam Jun 25 '19 at 10:59
  • I guess you're not visualizing the type of input data in the stream. I should have included an example so you get the right idea of what it's doing. Now that I've written a ton of tests for the stream, I've got lots of data so if I get a moment I'll paste some into the question for the sake of completeness. – Adam Jun 25 '19 at 11:08
0

If you don't like to iterate over entry sets (don't want to handle Map.Entry) or prefer a different solution without groupingBy, you can also use toMap with a merge function to aggregate your values. This approach works nicely because Client can hold the initial single item and the accumulated collection of all EmailAddress (Note: I used a utility function com.google.common.collectSets.union for conciseness, but you can just work with e.g. HashSet).

The following code demonstrates how to do it (add Reports in the same manner as EmailAddress, and add the other fields you want). I left the merge function inline and did not add an AllArgsConstructor, but feel free to refactor.

static Client mapFlatItemToClient(final IncomingFlatItem item) {
    final Client client = new Client();
    client.clientCode = item.clientCode;
    client.emailAddresses = Collections.singleton(mapFlatItemToEmail(item));
    return client;
}

static EmailAddress mapFlatItemToEmail(final IncomingFlatItem item) {
    final EmailAddress address = new EmailAddress();
    address.emailAddress = item.emailAddress;
    return address;
}

public static void example() {
    final List<IncomingFlatItem> items = new ArrayList<>();

    // Aggregated Client Info by Client Code
    final Map<String, Client> intermediateResult = items.stream()
            .collect(
                    Collectors.<IncomingFlatItem, String, Client> toMap(
                            flat -> flat.clientCode,
                            flat -> mapFlatItemToClient(flat),
                            (lhs, rhs) -> {
                                final Client client = new Client();
                                client.clientCode = lhs.clientCode;
                                client.emailAddresses = Sets.union(lhs.emailAddresses, rhs.emailAddresses);
                                return client;
                            }));

    final Collection<Client> aggregatedValues = intermediateResult.values();
}
sfiss
  • 2,119
  • 13
  • 19