0

I have a usecase of picking up x number of elements in a batch from total n number of elements. Currently I'm using Google Guava Lists like this :

List<String> dataList; 

List< List<String> > smallerLists = Lists.partition(dataList, maxRecordsInABatch); 

for ( smallerList : smallerLists ) {
    doSomething(data);
}

Is there a better way to do it, in which I can avoid making smallerLists and on-the run time pick x-sized lists from n-sized list.

I was also exploring Java 8 Streams and Lambda expressions but couldn't find something to cater this.

dushyantashu
  • 523
  • 3
  • 9
  • 16
  • Are you looking to just split the data. Or to perform parallel tasks after splitting the data? – Thirler Nov 22 '16 at 13:21
  • 1
    http://stackoverflow.com/a/30694866/829571 – assylias Nov 22 '16 at 13:26
  • 2
    If you have a look at the source of `Lists.partition()` you'll see that you'll get a `Partition` which call `subList()` on the list you pass. In case of the built-in lists that call will create small objects that operate on the underlying list so in fact your code should not generate any new lists. – Thomas Nov 22 '16 at 13:30
  • 3
    Lists.partition() strikes me as a very good way to do it; I can't tell how to beat it without more information about what you think of as "better". None of the items in the list are duplicated, and virtually no object pointers are duplicated, either - according to the javadocs, the smaller lists are a view onto the larger list. The only two possible improvements I can guess at are that you want to modify the smaller lists without altering the larger one (soln: clone the sublists), or you want to do some sort of cloud job. Do either of these hold? – Paul Brinkley Nov 22 '16 at 13:31
  • 1
    @Thomas @Paul Brinkley My only concern was data-redundancy. I was thinking `Lists.partition()` creates new smaller lists with copy of data from original list. If this is not the case, and in fact I'm working with the same underlying list, then I'm fine with my approach. – dushyantashu Nov 22 '16 at 13:40
  • 1
    In fact even if you'd get new lists they'd still refer to the same data, i.e. the elements in a list will not be copied, just the references to them (which is about 8 bytes per reference). – Thomas Nov 22 '16 at 13:45
  • 1
    Similar to [this question](http://stackoverflow.com/q/28210775/2711488). If the source is a `List` and the result ought to be a `List>`, i.e. neither source nor target ought to be a stream, `Lists.partition` is good enough (note the second answer there)… – Holger Nov 22 '16 at 14:10
  • 1
    The concern about redundancy is definitely fair. Even a concern about *pointer* redundancy is fair, IMO, if the list is very large. And in fact, I'd misread the javadoc for Lists.partition(); it's copying all of those pointers for the outer list, whereas I'd thought the sublists were views into the input list. That said, if you're only concerned about data redundancy, all's well. – Paul Brinkley Nov 22 '16 at 18:56

2 Answers2

0

There is a small trick you can use for list partition in java-8. First you group splitting by the number of partions then you create a new list with grouped values. Something like this:

List<Integer> intList = Lists.newArrayList(1, 2, 3, 4, 5, 6, 7, 8);
Map<Integer, List<Integer>> groups = intList.stream().collect(Collectors.groupingBy(s -> (s - 1) / 3));
System.out.println(groups);
List<List<Integer>> subSets = new ArrayList<List<Integer>>(groups.values());
System.out.println(subSets);
0

You can use partioningBy the collector. As described in the below link: https://www.javacodegeeks.com/2015/11/java-8-streams-api-grouping-partitioning-stream.html

KayV
  • 12,987
  • 11
  • 98
  • 148