9

Using only the standard Java library, what is a simple mechanism to join strings up to a limit, and append an ellipsis when the limit results in a shorter string?

Efficiency is desirable. Joining all the strings and then using String.substring() may consume excessive memory and time. A mechanism that can be used within a Java 8 stream pipeline is preferable, so that the strings past the limit might never even be created.

For my purposes, I would be happy with a limit expressed in either:

  • Maximum number of strings to join
  • Maximum number of characters in result, including any separator characters.

For example, this is one way to enforce a maximum number of joined strings in Java 8 with the standard library. Is there a simpler approach?

final int LIMIT = 8;

Set<String> mySet = ...;
String s = mySet.stream().limit( LIMIT ).collect( Collectors.joining(", "));
if ( LIMIT < mySet.size()) {
    s += ", ...";
}
Tunaki
  • 132,869
  • 46
  • 340
  • 423
Andy Thomas
  • 84,978
  • 11
  • 107
  • 151
  • This question is NOT asking to "recommend or find a book, tool, software library, tutorial or other off-site resource." – Andy Thomas Mar 21 '16 at 13:03

3 Answers3

9

You can write your custom collector for this. This one is based on another I wrote for a similar case:

private static Collector<String, List<String>, String> limitingJoin(String delimiter, int limit, String ellipsis) {
    return Collector.of(
                ArrayList::new, 
                (l, e) -> {
                    if (l.size() < limit) l.add(e);
                    else if (l.size() == limit) l.add(ellipsis);
                },
                (l1, l2) -> {
                    l1.addAll(l2.subList(0, Math.min(l2.size(), Math.max(0, limit - l1.size()))));
                    if (l1.size() == limit) l1.add(ellipsis);
                    return l1;
                },
                l -> String.join(delimiter, l)
           );
}

In this code, we keep an ArrayList<String> of all the encoutered Strings. When an element is accepted, the size of the current list is tested against the limit: strictly less than it, the element is added; equal to it, the ellipsis is added. The same is done for the combiner part, which is a bit more tricky because we need to handle properly the size of the sublists to not go over the limit. Finally, the finisher just joins that list with the given delimiter.

This implementation works for parallel Streams. It will keep the head elements of the Stream in encounter order. Note that it does consume all the elements in the Stream even though no element are added after the limit has been reached.

Working example:

List<String> list = Arrays.asList("foo", "bar", "baz");
System.out.println(list.stream().collect(limitingJoin(", ", 2, "..."))); // prints "foo, bar, ..."
Community
  • 1
  • 1
Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • Thank you. Consuming all the elements in the Stream could be avoided, if necessary, by inserting `limit(LIMIT + 1)` before the `collect()`. – Andy Thomas Mar 04 '16 at 18:37
  • 1
    @AndyThomas Yes. But the collector itself is not short-circuiting. – Tunaki Mar 04 '16 at 18:38
  • 1
    +1 for nice implementation of number-of-string joiner. A number-of-characters joiner would be more complex. – Andreas Mar 04 '16 at 21:48
9

While using third-party code is not an option for the asker, it might be acceptable for other readers. Even writing custom collector you still have a problem: the whole input will be processed as standard collectors cannot short-circuit (in particular it's impossible to process infinite Stream). My StreamEx library enhances collectors concept making possible to create short-circuiting collector. The Joining collector is also readily provided:

StreamEx.of(mySet).collect( 
    Joining.with(", ").ellipsis("...").maxChars(100).cutAfterDelimiter() );

The result is guaranteed not to exceed 100 characters. Different counting strategies could be used: you can limit by chars, by code points or by graphemes (combining Unicode characters will not be counted). Also you can cut the result at any position ("First entry, second en...") or after word ("First entry, second ..."), or after delimiter ("First entry, ..."), or before delimiter ("First entry, second entry..."). It also works for parallel stream, though probably not very efficient in ordered case.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
1

Using only the standard Java library

I don't believe there is anything in there that can do what you ask.

You need to write your own Collector. It won't be that complicated, so I don't see why writing your own would be an issue.

Andreas
  • 154,647
  • 11
  • 152
  • 247
  • That has occurred to me. I could `limit(LIMIT + 1)` to traverse only part of the stream, but still allow the custom collector to know that the limit has been reached. Was hoping there was something simpler that I had missed. – Andy Thomas Mar 04 '16 at 18:22
  • Limiting the *number* of strings seems meaningless to me. It should limit the *length* of the result, which means that `limit()` is not the answer. – Andreas Mar 04 '16 at 18:24
  • Certainly in some cases that is true. However, I have a case where the strings are known a priori to be short; an exact limit in characters is not necessary; and I would want only whole strings to be included. – Andy Thomas Mar 04 '16 at 18:34