2

General problem I don't know how to divide a sorted list to smaller sorted lists but NOT in a manner like in Guava Lists.partition(list,size) - so not to a smaller lists of specified size BUT to a fixed number of lists of similar size.

For example having a source list: 1,2,3,4 I want to have 3 lists as a results (the 3 is the fixed number of resulting lists). I should have the results List<List<Long>>: ListOne: 1, ListTwo: 2, ListThree: 3,4 (keep in mind that sorting is kept). When source list is smaller then number of target lists, then OK, I could get smaller number of lists. So whe a source list is 1,2 and I want to have 3 lists, the algorithm should return two lists: List1 1, List2: 2.

The size of the source list is unknown, but there are hundret of thousends of elements that have to be divided to 10 lists, becaouse then 10 threads are ready to do some more complicated operations with those elements.

The below algorithm is totally wrong, having 14 elements on source list and passing GRD_SIZE=10 it returns 7 lists of 2 elements. It should return GRD_SIZE=10 lists of similar sizes. Propably I shouldn't also use the Guava Lists.partition method... But how to do this task ?

List<List<Long>> partitions = partition(sourceList, GRD_SIZE);

public static <T> List<List<T>> partition(List<T> ascSortedItems, int size)
{
     int threadSize =  (int) Math.ceil(
         new BigDecimal(ascSortedItems.size()).divide(
             new BigDecimal(
                 ascSortedItems.size() >= size ? size : ascSortedItems.size()
             )
          ).doubleValue()
     );

     final AtomicInteger counter = new AtomicInteger(0);

     return ascSortedItems.stream()
         .collect(Collectors.groupingBy(it -> counter.getAndIncrement() / threadSize))
         .values()
         .stream()
         .collect(Collectors.toList());
}
michealAtmi
  • 1,012
  • 2
  • 14
  • 36
  • Given `n` as `GRD_SIZE` couldn't you just create `n` lists and iterate over the elements of the source list, distributing it to the `n` lists. So in general the `i-th` element of the source list, is placed in the `i % n`-th new list. Repeat until the end of the source list is reached, then return the list of new lists. – GxTruth Jul 20 '18 at 12:14
  • The problem is not well defined. How does "List1: 1, List2: 2, List3: 3,4" meet the requirement of " fixed number of lists of similar size" ? Can lists overlap ? – c0der Jul 20 '18 at 12:18
  • This fixed number of lists is 3. The source list is 1,2,3,4. I must somehow put in words the fact, that the resulting lists won't have sometimes the same size, so I use similar size term. Not always u would be able to divide the source collection to a lists of the same size. Only the resulting number of lists is known. So what would u like to know additionally? – michealAtmi Jul 20 '18 at 12:30
  • GxTruth, I can't do that - this won't keep the order of elements. – michealAtmi Jul 20 '18 at 12:31
  • Isn't partition (chunking by size) the same thing but with one less math? I mean, if you know how many lists, and you know the source list length, isn't it just division? Maybe I'm misreading the question. – Dave Newton Jul 20 '18 at 13:49
  • 2
    @DaveNewton the problem, as far as I understood, is that if the source list’s size is not a multiple of the desired list size, like with the example of fourteen elements to be split into ten lists, calculating a chunk size first will not produce the desired number of lists. So the two problem are closely related, but different. See [my answer](https://stackoverflow.com/a/51444812/2711488) for more details. – Holger Jul 20 '18 at 14:33

3 Answers3

3

First, you should not use a mutable counter like in your attempted solution. A canonical alternative would be like in this answer, which, after adapting to your problem, would look like:

return IntStream.range(0, (ascSortedItems.size()+threadSize-1)/threadSize)
    .mapToObj(i -> ascSortedItems
          .subList(i*threadSize, Math.min(ascSortedItems.size(), (i+1)*threadSize)))
    .collect(Collectors.toList());
}

The calculation of threadSize doesn’t need that strange BigDecimal detour. You can calculate it as

int threadSize = Math.max(1, ascSortedItems.size()/size);

when rounding down or

int threadSize = Math.max(1, (ascSortedItems.size()+size-1)/size);

for rounding up.

But neither will work the intended way. To stay with your example, rounding down will create 14 lists of size one, rounding up will create 7 lists of size two.

A real solution can only be done by not calculating a batch size first:

public static <T> List<List<T>> partition(List<T> ascSortedItems, int numLists) {
    if(numLists < 2) {
        if(numLists < 1) throw new IllegalArgumentException();
        return Collections.singletonList(ascSortedItems);
    }
    int listSize = ascSortedItems.size();
    if(listSize <= numLists) {
        return ascSortedItems.stream()
            .map(Collections::singletonList)
            .collect(Collectors.toList());
    }
    return IntStream.range(0, numLists)
        .mapToObj(i -> ascSortedItems.subList(i*listSize/numLists, (i+1)*listSize/numLists))
        .collect(Collectors.toList());
}

The first part checks for the validity of the numLists argument and handles the trivial special case of one list.

The middle part handles your requirement of returning less lists if the source list is smaller than the requested number of lists (otherwise, the result would always have the requested number of lists, possibly containing empty lists).

The last part does the actual job. As you can see, the initial IntStream.range(0, numLists) invariably produces numLists elements which are then mapped to the sublists, postponing rounding to the latest point possible. For your example of [ 1, 2, 3, 4] and three requested lists, it produces

[1]
[2]
[3, 4]

for the 14 elements and requesting ten lists, it produces

[1]
[2]
[3, 4]
[5]
[6, 7]
[8]
[9]
[10, 11]
[12]
[13, 14]

It’s unavoidable to have some lists of size one and some of size two to fulfill the request to have exactly ten lists. That’s the fundamental difference to the first, size goal based, solution where at most one list has a different size than the others.

Holger
  • 285,553
  • 42
  • 434
  • 765
1

n is the number of elements in the original list
z is the grid size
s = z/n (integer division) gives the base number of items each array should hold
r is the remainder of the integer division

Run a loop for the first z-r arrays appending each with s items in correct order

Run second loop for the last r arrays appending each with s + 1 items in correct order.

Joakim Danielson
  • 43,251
  • 5
  • 22
  • 52
1

Assumption: The following answer would work only with ordered & sequential streams. The question asks to break a sorted list(which is ordered) into smaller sorted lists.

If the stream is unordered, you'll have to order the stream by ways like Stream.sorted().


Let's construct a list of 17 elements.

List<Long> sourceList = LongStream.range(0,17).collect(ArrayList::new,ArrayList::add,ArrayList::addAll);

To divide 17 elements into equal sets of 10 lists, we will have to make 7 lists of 2 elements & 3 lists of 1 element.

In other words, 10 equal lists can be created with a min of 1 elements. And 7 extra elements can be added in the 1st 7 lists.

int minElementinEachList = sourceList.size() /10;      //1
int extraElements = sourceList.size() %10;             //7

Till 7(extraElements) lists are created, we can add an extra element in the lists. Use the following code:

AtomicInteger counter = new AtomicInteger(0);
Map<Number,List<Number>> map = sourceList.stream().collect(
    Collectors.groupingBy(it -> {
    int key = counter.getAndIncrement() / (minElementinEachList + 1);
    if(key >= extraElements && (counter.get() + 1) % (minElementinEachList +1 ) == 0){
        counter.getAndIncrement();
    }
    return key;
}));
System.out.println(map);

Output:

{0=[0, 1], 1=[2, 3], 2=[4, 5], 3=[6, 7], 4=[8, 9], 5=[10, 11], 6=[12, 13], 7=[14], 8=[15], 9=[16]}
Pankaj Singhal
  • 15,283
  • 9
  • 47
  • 86
  • 1
    It’s strongly discouraged to use counters this way. And it will break, if you use it with a parallel stream. In fact, it is not guaranteed to work with a sequential stream either, but just happens to do the desired thing with the current implementation. – Holger Jul 20 '18 at 15:51
  • Can you provide a case in sequential stream where it'll fail? – Pankaj Singhal Jul 20 '18 at 15:54
  • 1
    As said, it happens to do the desired thing *with the current implementation*, therefore, there is no example where it fails, *with the current implementation*. This doesn’t change the fact that the *specification* doesn’t make any evaluation order guarantees, hence, it *could* break with a different implementation, as the maintainers of the Stream implementation are not required to ensure compatibility with code relying on the evaluation order. – Holger Jul 20 '18 at 16:07
  • Have added a suitable assumption about stream being ordered (which is a given in the question). A workaround to use `Stream.sorted()` is available if the stream is unordered & the question demands a certain kind of order. Also, for the assertion that the answer could break in a different implementation - I guess we should leave it within the context of the question. An appropriate answer could be given with a different set of pre-conditions. However, I agree that it'll not work with parallel streams because of the use of the counter. – Pankaj Singhal Jul 20 '18 at 16:52
  • 1
    It seems, you still don’t understand the difference between *processing order* and *encounter order*. An ordered stream just implies that there is an *encounter order*, but only if you fulfill the contract (i.e. not to use functions with side effects like counter manipulations), you can expect a result consistent with the encounter order, as the *processing order*, i.e. the order in which your function gets evaluated, can be different. Observing a processing order which matches the encounter order (for anything else than `forEachOrdered`), is just an artifact of the *current implementation*. – Holger Jul 20 '18 at 19:40
  • Can you please explain more on why using side effects like counter manipulation would make the result inconsistent with the encounter order? – Pankaj Singhal Jul 21 '18 at 03:05
  • You may read the sections [Stateless behaviors, Side-effects, and Ordering of the package documentation](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness). And [this answer](https://stackoverflow.com/a/29218074/2711488) (including Stuart Marks’ comment). Also, what’s said about `peek` in [this answer](https://stackoverflow.com/a/33636377/2711488) applies to side-effects in behavioral parameters too. – Holger Jul 24 '18 at 16:55