I am often using batch()
in Python. Is there some alternative in JavaScript since ES6 which has iterators and generator functions?
Asked
Active
Viewed 1,241 times
5

liborm
- 2,634
- 20
- 32
3 Answers
7
I had to write one for myself, which I'm sharing here for me and the others to find here easily:
// subsequently yield iterators of given `size`
// these have to be fully consumed
function* batches(iterable, size) {
const it = iterable[Symbol.iterator]();
while (true) {
// this is for the case when batch ends at the end of iterable
// (we don't want to yield empty batch)
let {value, done} = it.next();
if (done) return value;
yield function*() {
yield value;
for (let curr = 1; curr < size; curr++) {
({value, done} = it.next());
if (done) return;
yield value;
}
}();
if (done) return value;
}
}
It yields generators, not Array
s for example. You have to fully consume each batch before calling next()
on it again.
-
I hope you don't mind my edit that makes the final value always be emitted from the outer iterator. Feel free to roll it back if you don't like it. – Bergi Jan 25 '19 at 16:43
-
1Thanks, I like your version more .. I did not have enough 'distance' for the final cleanup;) – liborm Jan 25 '19 at 16:45
2
Came here looking to see what other people had suggested. Here's the version I wrote in TypeScript initially before looking at this post.
async function* batch<T>(iterable: AsyncIterableIterator<T>, batchSize: number) {
let items: T[] = [];
for await (const item of iterable) {
items.push(item);
if (items.length >= batchSize) {
yield items;
items = []
}
}
if (items.length !== 0) {
yield items;
}
}
This allows you to consume an iterable in batches as shown below.
async function doYourThing<T>(iterable: AsyncIterableIterator<T>) {
const itemsPerBatch = 5
const batchedIterable = batch<T>(iterable, itemsPerBatch)
for await (const items of batchedIterable) {
await someOperation(items)
}
}
In my case, this allowed me to use bulkOps in Mongo a little more easily as demonstrated below.
import { MongoClient, ObjectID } from 'mongodb';
import { batch } from './batch';
const config = {
mongoUri: 'mongodb://localhost:27017/test?replicaSet=rs0',
};
interface Doc {
readonly _id: ObjectID;
readonly test: number;
}
async function main() {
const client = await MongoClient.connect(config.mongoUri);
const db = client.db('test');
const coll = db.collection<Doc>('test');
await coll.deleteMany({});
console.log('Deleted test docs');
const testDocs = new Array(4).fill(null).map(() => ({ test: 1 }));
await coll.insertMany(testDocs);
console.log('Inserted test docs');
const cursor = coll.find().batchSize(5);
for await (const docs of batch<Doc>(cursor as any, 5)) {
const bulkOp = coll.initializeUnorderedBulkOp();
docs.forEach((doc) => {
bulkOp.find({ _id: doc._id }).updateOne({ test: 2 });
});
console.log('Updating', docs.length, 'test docs');
await bulkOp.execute();
}
console.log('Updated test docs');
}
main()
.catch(console.error)
.then(() => process.exit());

Ryan Smith
- 1,255
- 2
- 13
- 16
-
I really like your solution because it's generic. I'd propose to reduce the usage example to two or three lines though, to make it easier to see the benefits. – Sebastian Jun 06 '21 at 11:27
0
Here's a relatively clean example in Typescript:
function* batchIterable<T>(iter: Iterable<T>, batchSize: number): Iterable<Iterable<T>> {
const iterator = iter[Symbol.iterator]()
let done = false
while (!done) {
const batch: T[] = []
while (batch.length < batchSize) {
const res = iterator.next()
if (res.done) {
done = true
break
} else {
batch.push(res.value)
}
}
if (batch.length > 0) {
yield batch
}
}
}
Works with any iterable, including arrays:
> Array.from(batchIterable([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3))
[ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ], [ 10 ] ]
And also with generators:
function* genNums() {
yield 1;
yield 2;
yield 3;
yield 4;
}
> Array.from(batchIterable(genNums(), 3))
[ [ 1, 2, 3 ], [ 4 ] ]
But, not with generators that return a separate value from the yielded values:
function* genNums() {
yield 1;
yield 2;
yield 3;
yield 4;
return 5;
}
> Array.from(batchIterable(genNums(), 3))
[ [ 1, 2, 3 ], [ 4 ] ] // return-ed value 5 not included

Nishant George Agrwal
- 2,059
- 2
- 12
- 14