-2

I have been crashing my head these days trying to figure out how can I use Java 8 stream API perform a reduction and at the same time maintain an index. Here is one example:

I have the following string:

String charSequence = "kjsfjsfajdsfjsaaaaaasssddfddddbbbdddaaa";

I would like as a result of a Stream operation to return a triplet (I, N, C)

Where:

  • C is a character
  • N is a number of occurrences - should be maximum
  • I is the index of the first occurrence of the string (first if there are more than one)

Examples:

  • "ddaaaacccjcccccjjj" returns (10, 5, c)
  • "ddaaacccaaa" the first occurrence of "aaa" is 2 so the result would be: (2, 3, a)
Alexander Petrov
  • 9,204
  • 31
  • 70
  • 1
    Think yourself - you asked a not so bad question IMO and the top rated answer tells you why you should not do it; not even remotely suggesting a possibility :) this is doomed to be a weird treated question – Eugene Sep 14 '18 at 13:27
  • @Eugene sometimes example of antipattern can be a very good example. I don't think this makes the question bad. It place the thing on a different perspective. – Alexander Petrov Sep 14 '18 at 13:30
  • if only you would be asking for an anti-pattern right? I mean does that answer satisfy you? Now I am just wondering, is it me being weird? – Eugene Sep 14 '18 at 13:31
  • @Eugene You are right. – Alexander Petrov Sep 14 '18 at 13:32
  • Dont take my words too literaly. It was just a suggestion that anti questions can be good questions. – Alexander Petrov Sep 14 '18 at 13:35
  • Why a question with concrete problem, even with example input and output is closed with feedback "Update question so it focused on one problem only" I am posting example input and output. How can it be more focused than this ? – Alexander Petrov Mar 05 '21 at 21:48

4 Answers4

6

I am trying to understand why maintaining index is difficult...

The purpose of Stream API is to perform operations through pipelines focused on the elements, not the indices. Indexing of every element requires a sequential processing of Stream which is in conflict with the point of parallel streaming which would have to synchronize to work with indices - and it kills the idea.

or why it can not be done if that is the case.

Yet, there is surprisingly still a way of iterating two or more sources (Collection, array...) at once using IntStream::range to iterate the index itself that is iterated:

IntStream.range(0, 10).map(i -> list.get(i) + array[i])...

... trying figure out how can I using Java 8 Stream API perform a reduction and at the same time maintain an index

... but neither the solution above nor any other care the previous n elements. The processed element should be independent of the others.

Forget the Stream API in this case. Go back to traditional and procedural for-loop. You can get a result with a single loop.

Nikolas Charalambidis
  • 40,893
  • 16
  • 117
  • 183
  • Thanks you have +1 from me. I like your example with the IntStream. – Alexander Petrov Sep 14 '18 at 12:45
  • 3
    @Nikolas this remind me of GhostCat answers now (which have their truth). OP asks how do I do X and the top rated answer says: don't, because Y and Z; even if there are two answers here showing, yup, you can. – Eugene Sep 14 '18 at 13:30
  • @Eugene: I have read some here and chose a more descriptive style of the answer instead. You *can* use Streams here (answers are already provided), but there is no nice and smooth way - thus you *should not* use them. Do you agree? :)) – Nikolas Charalambidis Sep 14 '18 at 13:39
  • 2
    @Nikolas I don't; but it's not me asking this question in the first place, so you should not care (and please don't). – Eugene Sep 14 '18 at 13:42
  • @Eugene. If you disagree, down-vote me then. Yet, I am glad for the feedback. Thanks anyway. – Nikolas Charalambidis Sep 14 '18 at 16:15
  • @Nikolas I will not downvote, its not my question ;) – Eugene Sep 14 '18 at 16:21
2

Don't think that streams are very helpful here. Here is a solution using regex and a List of of results:

    Pattern p = Pattern.compile("(\\w)\\1+");
    Matcher m = p.matcher(charSequence);

    List<Triple> list = new ArrayList<>();

    while (m.find()) {
        int start = m.start();
        int end = m.end();
        int diff = end - start;

        if (list.isEmpty()) {
            list.add(new Triple(m.group(0).charAt(0), diff, start));
        } else if (list.get(list.size() - 1).getN() == diff) {
            list.add(new Triple(m.group(0).charAt(0), diff, start));
        } else if (diff > list.get(list.size() - 1).getN()) {
            list.clear();
            list.add(new Triple(m.group(0).charAt(0), diff, start));
        }
    }

And a Triple:

   static class Triple {
    private final Character c;

    private final long n;

    private final int i;

    public Triple(Character c, long n, int i) {
        this.c = c;
        this.n = n;
        this.i = i;
    }

    // getters

} 

I have a solution like this for example:

 List<Triple> result = p.matcher(charSequence).results()
            .collect(
                    Collector.of(
                            ArrayList::new,
                            (l, mr) -> {
                                int diff = mr.end() - mr.start();

                                if (!l.isEmpty() && l.get(l.size() - 1).getN() < diff) {
                                    l.clear();
                                }

                                if (l.isEmpty() || l.get(l.size() - 1).getN() == diff) {
                                    l.add(new Triple(mr.group(0).charAt(0), diff, mr.start()));
                                }

                            },
                            (left, right) -> {
                                throw new UnsupportedOperationException("Not for parallel");
                            }));
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • I am not really looking for an alternative solution here. I am trying to either find a way to implement it with streams or find good arguments why it can not be done or at least example of a very bad solution to understand the weaknesses of the stream API – Alexander Petrov Sep 14 '18 at 12:07
  • Even bad solution with streams is still fine with me. – Alexander Petrov Sep 14 '18 at 12:09
  • 2
    @AlexandarPetrov see edit – Eugene Sep 14 '18 at 12:20
  • jesus :) I guess I need to install Java 9 to test it. You will have plus 1 from at least for the effort. But can you think of some arguments why this task is difficult with streams. If you are trying to explain it to a third person how would you explain it ? – Alexander Petrov Sep 14 '18 at 12:29
  • I think it may be possible for one of the collectors to be avoided if use sequential stream. – Alexander Petrov Sep 14 '18 at 12:34
  • @AlexandarPetrov I don't find it difficult. Java 9 has some nice APIs that deal very well with regexps/patterns/matchers and streams, it's just that you need to get used to this API. Once you do, you'll see there are no mysteries here. I think this solution can be improved by not using a `TreeMap`, but just one list holding triplets for the max-length character. – fps Sep 14 '18 at 12:36
  • @FedericoPeraltaSchaffner agree – Alexander Petrov Sep 14 '18 at 12:37
  • 1
    @FedericoPeraltaSchaffner updated, but I really liked the `TreeMap`, this is a tiny bit less verbose though – Eugene Sep 14 '18 at 13:23
2

It could be like this:

I wrote in several step.

result1 : the first step splits repeated character then grouping it by first char.

example for char c:

{'c',["ccc","ccccc"]}

Map<Character,List<String>> result1 =  Stream.of(charSequence.split("(?<=(.))(?!\\1)"))
            .collect(Collectors.groupingBy(s->s.charAt(0)));

result2 : in this step the result is list of string with max length of previous result. as you see we had ["ccc","ccccc"], so here we just use ccccc sequence.

  List<String> result2 =   result1.entrySet()
            .stream()
            .map(entry->entry.getValue()
                .stream().max(Comparator.comparingInt(String::length)).get())
            .collect(Collectors.toList());

result: final step is your expected result.

 List<Triple> result =  result2
          .stream()
          .map(str1->new Triple(str1.charAt(0),str1.length(),charSequence.indexOf(str1)))
          .collect(Collectors.toList());

 Stream.of(charSequence.split("(?<=(.))(?!\\1)"))
        .collect(groupingBy(s -> s.charAt(0), 
            collectingAndThen(maxBy(comparingInt(String::length)), Optional::get)))
        .entrySet().stream()
        .map(m1 -> new Triple(m1.getKey(), m1.getValue().length(), charSequence.indexOf(m1.getValue())))
        .collect(Collectors.toList());
Hadi J
  • 16,989
  • 4
  • 36
  • 62
  • 3
    whenever your downstream collector performs a reduction and you find yourself needing the `collectingAndThen(…, Optional::get)`, you may check whether `toMap` is more appropriate. E.g. in your case `.collect(toMap(s -> s.charAt(0), s -> s, BinaryOperator.maxBy(comparingInt(String::length))))` – Holger Sep 14 '18 at 16:20
2

Think different and look outside. Here is one of the alternative solutions by StreamEx, and you may not accept it by your declaration:

String str = "ddaaaacccjcccccjjj";

IntStreamEx.range(0, str.length()).boxed() 
    .collapse((i, j) -> str.charAt(i) == str.charAt(j), Collectors.toList()) 
    .maxBy(l -> l.size()) 
    .map(l -> Triple.of(l.get(0), l.size(), str.charAt(l.get(0))))
    .ifPresent(System.out::println);

// output: [10, 5, c]

And to get all:

String str = "ddaaacccaaa";

IntStreamEx.range(0, str.length()).boxed() 
    .collapse((i, j) -> str.charAt(i) == str.charAt(j), Collectors.toList()) 
    .collect(MoreCollectors.maxAll(Comparators.comparingBy(l -> l.size()))) 
    .stream().map(l -> Triple.of(l.get(0), l.size(), str.charAt(l.get(0))))
    .forEach(System.out::println);

// output
// [2, 3, a]
// [5, 3, c]
// [8, 3, a]

To distinct the result by character:

Collector<List<Integer>, ?, StreamEx<List<Integer>>> collector = Collectors.collectingAndThen(
    MoreCollectors.maxAll(Comparators.comparingBy(l -> l.size())), StreamEx::of);

IntStreamEx.range(0, str.length()).boxed() 
    .collapse((i, j) -> str.charAt(i) == str.charAt(j), Collectors.toList()) 
    .collect(collector) 
    .distinct(l -> str.charAt(l.get(0))) 
    .map(l -> Triple.of(l.get(0), l.size(), str.charAt(l.get(0)))) 
    .forEach(System.out::println);

// output
// [2, 3, a]
// [5, 3, c]

Update: Is it good enough? actually no, because it creates unnecessary temporary List. I think there is a better solution by intervalMap.

IntStreamEx.range(0, str.length()).boxed()
    .intervalMap((i, j) -> str.charAt(i) == str.charAt(j), Pair::of)
    .maxBy(p -> p.right - p.left)
    .map(p -> Triple.of(p.left, p.right - p.left + 1, str.charAt(p.left)))
    .ifPresent(System.out::println);
user_3380739
  • 1
  • 14
  • 14
  • I actually think this solution is awesome. – Alexander Petrov Sep 14 '18 at 17:56
  • @AlexandarPetrov it is awesome indeed, but requires knowledge of `StreamEx`... Still +1, this looks nice indeed. Btw, you could accept an answer here and accept it – Eugene Sep 17 '18 at 14:04
  • @Eugene I will. Haven't got time to go in details of all the solutions yet. I think that this solution can be translated to normal stream if collapse is implemented. – Alexander Petrov Sep 17 '18 at 14:10