1

I'm building a browser tool that samples a big file and shows some stats about it.

The program picks k random parts of a file, and processes each part of the file separately. Once each part is processed, an object is modified that keeps track of the rolling "stats" of the file (in the example below, I've simplified to incrementing a rolling counter).

The issue is that now every part is read in parallel, but I'd like it to be in series - so that the updates to the rolling counter are thread safe.

I think the next processFileChunk the for-loop is executing before the other finishes. How do I get this to be done serially?

I'm fairly new to Vue, and frontend in general. Is this a simple asynchronicity problem? How do I tell if something is asynchronous?

Edit: the parsing step uses the papaparse library (which I bet is the asynchronous part)

import {parse} from 'papaparse'

export default {

  data() {
    counter: 0
  },

  methods() {
    streamAndSample(file) {
      var vm = this;

      const k = 10 // number of samples
      var pointers = PickRandomPointers(file)  // this is an array of integers, representing a random byte location of a file

      for (const k_th_random_pointer in pointers) {
        processFileChunk(file, k_th_random_pointer)
        }
    }

    processFileChunk(file, k_th_random_pointer){
      var vm = this;

      var reader = new FileReader();
      reader.readAsText(file.slice(k_th_random_pointer, k_th_random_pointer + 100000)) // read 100 KB
        reader.onload = function (oEvent) {
          var text = oEvent.target.result
          parse(text,{complete: function (res) {
              for (var i = 0; i < res.data.length; i++) {
                vm.counter = vm.counter + 1
            }}})
        }
      }
  } 
}

  • You should use promises, https://www.promisejs.org/patterns/#all. Another approach is for await (async,await). https://stackoverflow.com/questions/59694309/for-await-of-vs-promise-all – flakerimi Mar 23 '21 at 18:16

1 Answers1

0

"thread safe" JavaScript

JavaScript is single-threaded, so only one thread of execution is run at a time. Async operations are put into a master event queue, and each is run until completion one after another.

Perhaps you meant "race condition", where the file size determines when it affects the counter rather than the read order. That is, a smaller file might be parsed earlier (and thus bump the counter) than a larger one that the parser initially saw first.

Awaiting each result

To await the parser completion of each file before moving onto the next, return a Promise from processFileChunk() that resolves the parsed data length:

export default {
  methods: {
    processFileChunk(file, k_th_random_pointer) {
      return new Promise((resolve, reject) => {
        const reader = new FileReader()
        reader.onload = oEvent => {
          const text = text = oEvent.target.result
          const result = parse(text)
          resolve(result.data.length)
        }
        reader.onerror = err => reject(err)
        reader.onabort = () => reject()

        reader.readAsText(file.slice(k_th_random_pointer, k_th_random_pointer + 100000)) // read 100 KB
      })
    }
  }
}

Then make streamAndSample() an async function in order to await the result of each processFileChunk() call (the result is the data length resolved in the Promise):

export default {
  methods: {
     
    async streamAndSample(file) {
      const k = 10
      const pointers = PickRandomPointers(file)

      for (const k_th_random_pointer in pointers) {
                         
        const length = await processFileChunk(file, k_th_random_pointer)
        this.counter += length
      }
    }
  }
}

Aside: Instead of passing a cached this into a callback, use an arrow function, which automatically preserves the context. I've done that in the code blocks above.

It's worth noting the papaparse.parse() also supports streaming for large files (although the starting read index cannot be specified), so processFileChunk() might be rewritten as this:

export default {
  methods: {
    processFileChunk(file, k_th_random_pointer) {
      return new Promise((resolve, reject) => {
        parse(file, {
          chunk(res, parser) {
            console.log('chunk', res.data.length)
          },
          chunkSize: 100000,
          complete: res => resolve(res.data.length),
          error: err => reject(err)
        })
      })
    }
  }
}
tony19
  • 125,647
  • 18
  • 229
  • 307
  • Aha!! This is really helpful. You're pointing out the fact that "asyncronous" and "multi-threaded" are two disjointed topics. If javascript is single threaded, this means that incrementing a global counter in a streaming fashion would be fine, since only one operation will be updating it at each point in time. – Kostas Papadopoulos Mar 26 '21 at 16:26