2

My task: I have a file that contains many items and each item is related to an array of URLs of images which I need to download. I want to download all of the links, I'm using this library for the image downloading and I'm using promises.

The problem: The problem occurs when I start to download many images from many items, the program sends more than 4000 requests before the first one finished and the program crashes.

My solution: My idea was to only handle about 2 items at a time so that I'm downloading about 20 images at a time. I've tried all sorts of variations with promises and async functions but I'm pretty new to those so my attempts failed.

My code flow is something like this:

csvRun()

function csvRun(){
    for(let i = 1; i <= itemsAmount; i++){  // Loops over the items
        // I want to be able to run only x items at a time
        console.log('Item number ' + i)
        itemHandle() 
    }
}

function itemHandle(){ // This function seems useless here but since the item has more data I kept it here
    handleImages()
}


function handleImages(){  // Loops over the images of the item
    for(let g = 0; g < imagesAmount; g++){        
        // Here there is a promise that downloads images
        // For the example I'll use settimout
        setTimeout(() => {
            console.log('Image downloaded ' + g)
        }, 3000);

        /** If you want the error just use ImgDonwload instead of
            settimeout and set imagesAmount to 20 and itemsAmount 
            to 400
        */ 

    }

}

// Only here to recreate the error. Not necessarily relevant.
function ImgDownload(){
    var download = require('image-downloader')
    download // returns the promise so the handling could resume in order
    .image({
        url:
            "https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg",
        dest: "/folder/img.jpg"
    })
    .then(({ filename, image }) => {
        console.log("File saved to", filename);
    })
    .catch((err: Error) => {
        console.error(err);
    });
}

Currently, the code finishes the loop in csvRun and prints out Item number 1 up to Item number {itemsAmount} and after 3 seconds prints out all of the Image downloaded messages. I understand why that happens. I want to change the code so that each time only 2 calls to itemHandle are being made simultaneously.

Koby 27
  • 1,049
  • 1
  • 8
  • 17

4 Answers4

4

One option would be to have a loop that goes over the images and processes one after another. To then run multiple processings in parallel, start multiple loops:

  // Goes over the "data" array, calls and waits for each "task" and processes "runnerCount" tasks in parallel
  function inParallel(task, data, runnerCount) {
    let i = 0, results = [];

    async function runner() {
      while(i < data.length) {
         const pos = i++; // be aware: concurrent modification of i
         const entry = data[pos]; 
         results[pos] = await task(entry);
      }
   }

    const runners = Array.from({ length: runnerCount }, runner);

    return Promise.all(runners).then(() => results);
 }

To be used as:

  const delay = ms => new Promise(res => setTimeout(res, ms));

 inParallel(async time => {
   console.log(`Timer for ${time}ms starts`);
   await delay(time);
   console.log(`Timer for ${time}ms ends`);
 }, [5000, 6000, 1000]/*ms*/, 2/*in parallel*/);
Jonas Wilms
  • 132,000
  • 20
  • 149
  • 151
  • 1
    Wow, what a fascinating technique! Would you mind cross-posting this to [Limited parallelism with async/await](https://stackoverflow.com/q/39195441/1048572)? The only disadvantage is that it keeps running after a promise rejects. – Bergi Jul 14 '19 at 20:22
  • `Array.from({length: runnerCount}, runner)` :-) – Bergi Jul 14 '19 at 20:23
  • @bergi indeed thats better. Also I'm interested on your opinion on https://es.discourse.group/t/why-cant-parts-of-the-spec-be-written-in-javascript-itself/82/2 – Jonas Wilms Jul 14 '19 at 20:27
  • 1
    And concerning the error handling: `try { ... } catch { i = Infinity; }` could work – Jonas Wilms Jul 14 '19 at 20:28
  • How would I get the result array? – Koby 27 Jul 18 '19 at 20:18
  • 1
    @Koby27 the result of this should be an array `const results = inParallel(array, () => ())` – Nick Tomlin Jul 18 '19 at 22:29
  • @bergi cross-posting seems to be discouraged, a mod just deleted the other answer. – Jonas Wilms Jul 19 '19 at 08:42
  • @JonasWilms Uh, yes, you shouldn't just copy-paste the answer, you'd need to adapt it to the other (more generic) question. Basically, writing a new answer to that question, using the same solution technique. – Bergi Jul 19 '19 at 08:45
  • @JonasWilms Yes, I guess that was the problem. Also iirc mods are notified when 1:1-copies of answers are posted – Bergi Jul 19 '19 at 09:50
0

with vanilla promises you might do something like:

let pending_fetches = 0
const MAX_CONCURRENT = 2

const fetch_interval = setInterval(() => {
    if (items.length === 0) return clearInterval(fetch_interval)

    if (pending_fetches < MAX_CONCURRENT) {
        ++pending_fetches
        doFetch(items.pop()).then(response => {
            // do stuff with the response
            --pending_fetches
        })
    }    
}, 100)

with async/await something like:

const MAX_CONCURRENT = 2

const fetchLoop = async () => {
    while (items.length > 0) {
        const response = await doFetch(items.pop())
        // do stuff with the response
    }
}
for (let i = 0; i < MAX_CONCURRENT; ++i) fetchLoop()
Robert Lombardo
  • 152
  • 1
  • 9
0

Let's suppose your data looks like this

const items = [
  { id: 1,
    images: [
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
     ]
  },
  { id: 2,
    images: [
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
     ]
  },
  { id: 3,
    images: [
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
      'https://cdn.vox-cdn.com/thumbor/XKPu8Ylce2Cq6yi_pgyLyw80vb4=/0x0:1920x1080/1200x800/filters:focal(807x387:1113x693)/cdn.vox-cdn.com/uploads/chorus_image/image/63380914/PIA16695_large.0.jpg',
     ]
  }
];

I would run a simple for..of loop and iterate over images and download item by item

// this function will try to download images per items
const download = require('image-downloader')
const downloadImages = async (items = []) => {
  let promises = [];
  for (const item of items) {
    const images = item.images;
    // dest is item.id/imageIndex.jpg
    promsies = images.map((url, index) => download({url, dest: `/folder/${item.id}/${index}.jpg`}));
    await Promise.all(promises);
  }
}

downloadImages(items);
Shankar Regmi
  • 854
  • 1
  • 7
  • 16
0

I think I still prefer Jonas's implementation for being concise, but i'll add another to the ring. A few features:

  1. Results and errors are available in a stable array (based on position).
  2. This starts processing another item as soon as the individual worker function has finished, instead of batching things and waiting for each Promise.all to resolve.
function parallelMap(values, workFn, maxConcurrency = 2) {
  const length = values.length;
  const results = Array.from({ length });

  let pos = 0;
  let completed = 0;

  return new Promise(resolve => {
    function work() {
      if (completed === length) {
        return resolve(results);
      }

      if (pos >= length) {
        return;
      }

      const workIndex = pos;
      const item = values[workIndex];
      pos = pos + 1;

      return workFn(item, workIndex)
        .then(result => {
          results[workIndex] = result;
          completed = completed + 1;
          work();
        })
        .catch(result => {
          results[workIndex] = result;
          completed = completed + 1;
          work();
        });
    }

    for (let i = 0; i < maxConcurrency; i++) {
      work();
    }
  });
}

Usage:

async function fakeRequest({ value, time = 100, shouldFail = false }) {
  return new Promise((resolve, reject) => {
    setTimeout(() => {
      if (shouldFail) {
        reject("Failure: " + value);
      } else {
        resolve("Success: " + value);
      }
    }, time);
  });
}

test("basic 'working' prototype", async () => {
  const values = [1, 2, 3, 4, 5, 6];
  const results = await parallelMap(values, value => {
    return fakeRequest({ value, time: 100, shouldFail: value % 2 === 0 });
  });

  expect(results).toEqual([
    "Success: 1",
    "Failure: 2",
    "Success: 3",
    "Failure: 4",
    "Success: 5",
    "Failure: 6"
  ]);
}, 350); // each task takes ~100ms to complete, 6 tasks, two at a time = ~300 ms

See the codesandbox for a full test suite.

Nick Tomlin
  • 28,402
  • 11
  • 61
  • 90