5

I am often using batch() in Python. Is there some alternative in JavaScript since ES6 which has iterators and generator functions?

liborm
  • 2,634
  • 20
  • 32

3 Answers3

7

I had to write one for myself, which I'm sharing here for me and the others to find here easily:

// subsequently yield iterators of given `size`
// these have to be fully consumed
function* batches(iterable, size) {
  const it = iterable[Symbol.iterator]();
  while (true) {
    // this is for the case when batch ends at the end of iterable
    // (we don't want to yield empty batch)
    let {value, done} = it.next();
    if (done) return value;

    yield function*() {
      yield value;
      for (let curr = 1; curr < size; curr++) {
        ({value, done} = it.next());
        if (done) return;

        yield value;
      }
    }();
    if (done) return value;
  }
}

It yields generators, not Arrays for example. You have to fully consume each batch before calling next() on it again.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
liborm
  • 2,634
  • 20
  • 32
  • I hope you don't mind my edit that makes the final value always be emitted from the outer iterator. Feel free to roll it back if you don't like it. – Bergi Jan 25 '19 at 16:43
  • 1
    Thanks, I like your version more .. I did not have enough 'distance' for the final cleanup;) – liborm Jan 25 '19 at 16:45
2

Came here looking to see what other people had suggested. Here's the version I wrote in TypeScript initially before looking at this post.

async function* batch<T>(iterable: AsyncIterableIterator<T>, batchSize: number) {
  let items: T[] = [];
  for await (const item of iterable) {
    items.push(item);
    if (items.length >= batchSize) {
      yield items;
      items = []
    }
  }
  if (items.length !== 0) {
    yield items;
  }
}

This allows you to consume an iterable in batches as shown below.

async function doYourThing<T>(iterable: AsyncIterableIterator<T>) {
  const itemsPerBatch = 5
  const batchedIterable = batch<T>(iterable, itemsPerBatch)
  for await (const items of batchedIterable) {
    await someOperation(items)
  }
}

In my case, this allowed me to use bulkOps in Mongo a little more easily as demonstrated below.

import { MongoClient, ObjectID } from 'mongodb';
import { batch } from './batch';

const config = {
  mongoUri: 'mongodb://localhost:27017/test?replicaSet=rs0',
};

interface Doc {
  readonly _id: ObjectID;
  readonly test: number;
}

async function main() {
  const client = await MongoClient.connect(config.mongoUri);
  const db = client.db('test');
  const coll = db.collection<Doc>('test');
  await coll.deleteMany({});
  console.log('Deleted test docs');

  const testDocs = new Array(4).fill(null).map(() => ({ test: 1 }));
  await coll.insertMany(testDocs);
  console.log('Inserted test docs');

  const cursor = coll.find().batchSize(5);
  for await (const docs of batch<Doc>(cursor as any, 5)) {
    const bulkOp = coll.initializeUnorderedBulkOp();
    docs.forEach((doc) => {
      bulkOp.find({ _id: doc._id }).updateOne({ test: 2 });
    });
    console.log('Updating', docs.length, 'test docs');
    await bulkOp.execute();
  }
  console.log('Updated test docs');
}

main()
  .catch(console.error)
  .then(() => process.exit());
Ryan Smith
  • 1,255
  • 2
  • 13
  • 16
  • I really like your solution because it's generic. I'd propose to reduce the usage example to two or three lines though, to make it easier to see the benefits. – Sebastian Jun 06 '21 at 11:27
0

Here's a relatively clean example in Typescript:

function* batchIterable<T>(iter: Iterable<T>, batchSize: number): Iterable<Iterable<T>> {
    const iterator = iter[Symbol.iterator]()
    let done = false
    while (!done) {
        const batch: T[] = []
        while (batch.length < batchSize) {
            const res = iterator.next()
            if (res.done) {
                done = true
                break
            } else {
                batch.push(res.value)
            }
        }
        if (batch.length > 0) {
            yield batch
        }
    }
}

Works with any iterable, including arrays:

> Array.from(batchIterable([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3))
[ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ], [ 10 ] ]

And also with generators:

function* genNums() { 
    yield 1; 
    yield 2; 
    yield 3; 
    yield 4;
}
> Array.from(batchIterable(genNums(), 3))
[ [ 1, 2, 3 ], [ 4 ] ]

But, not with generators that return a separate value from the yielded values:

function* genNums() { 
    yield 1; 
    yield 2; 
    yield 3; 
    yield 4;

    return 5;
}
> Array.from(batchIterable(genNums(), 3))
[ [ 1, 2, 3 ], [ 4 ] ]  // return-ed value 5 not included
Nishant George Agrwal
  • 2,059
  • 2
  • 12
  • 14