0

I have a file open in the browser that I want to create a stream and read with JS. I want to read the file in chunks of 1kb, but it always reads the whole file.

import { ReadableStream as PolyfillReadableStream } from 'web-streams-polyfill';
import { createReadableStreamWrapper } from '@mattiasbuelens/web-streams-adapter';

const toPolyfillReadable = createReadableStreamWrapper(PolyfillReadableStream);

  const file = myFile;
  const fStreamReader = toPolyfillReadable(file.stream(), new ByteLengthQueuingStrategy({
    highWaterMark: 1024,
  }));
  const stream = [];

  for await (const value of fStreamReader) {  // This works because I'm using web-streams-polyfill
    console.log(value);  // This only runs once and prints the whole file.
  }
user5507535
  • 1,580
  • 1
  • 18
  • 39

2 Answers2

0

The following is a simplified version of https://stackoverflow.com/a/28318964/16462950. When I used it to read a 28MB file, the allocation timeline always stayed at around 1KB:

enter image description here

async function read(file) {
  for (var offset = 0; offset < file.size; offset += 1024) {
    var oneKB = await file.slice(offset, offset + 1024).text();
    console.log(oneKB);
  }
}
<input name="file" type="file" onchange="read(this.files[0])" />

A file.stream() is faster, but consumes 64KB chunks and therefore allocates more memory.

Heiko Theißen
  • 12,807
  • 2
  • 7
  • 31
  • Here, on Chrome, the chunk size is not 64KB, it varies a lot, and is usually much larger than 64KB, some larger than 1MB. – user5507535 Feb 13 '23 at 17:27
  • I tested in on Chrome (without polyfills) and observed 64KB chunks. – Heiko Theißen Feb 13 '23 at 17:36
  • When the writer is processing a chunk does the reader stop reading? The idea of getting chunks is to limit the memory consumed, so if I read 1GB file I don't want to consume 1GB of memory from the user. – user5507535 Feb 13 '23 at 17:56
  • Sorry, but this answer didn't work for me. Here I'm running a React Native app on iOS and it's reading the whole file anyway, even if I have a 20MB file it reads the whole file as a single chunk. I need a way to break this down, I would like at least 1MB chunks. – user5507535 Feb 13 '23 at 22:39
  • On Safari, it's always reading the whole file as a single chunk, even 1GB file. – user5507535 Feb 14 '23 at 00:12
  • I have changed my answer, does that help you? – Heiko Theißen Feb 14 '23 at 07:32
  • This is what I also though to do, it's not the optimal approach though. Streams are more performant as you said. By the way, it doesn't need to be 1KB, 1MB is OK for me, so 64KB chunks from stream is OK, but the problem is that Safari always loads the entire file. – user5507535 Feb 14 '23 at 10:15
  • If Safari submits an `` to the server as in [this answer](https://stackoverflow.com/a/74536595/16462950), with `file.pipe(...)` replaced with `file.on("data", function(chunk) {console.log(chunk.length)})`, does the logged chunk size equal the entire file size? – Heiko Theißen Feb 14 '23 at 10:26
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251857/discussion-between-user5507535-and-heiko-theissen). – user5507535 Feb 14 '23 at 10:53
0

here is how I implemented it. In this case the api is returning a ndjson as a stream (ReadableStream). I am reading the stream in chunks with a reader. In ndjson format, data is split by new lines, so each line by itself is a basic json which I parsed and added to fetchedData variable.

var fetchedData = [];

fetch('LinkGoesHere', {
    method: 'get',
    headers: {
        'Authorization': 'Bearer TokenGoesHere' // this part is irrelevant and you may not need it for your application
    }
})
.then(response => {
    if (!response.ok) {
        throw new Error(`HTTP error! Status: ${response.status}`);
    }
    return response.body.getReader();
})
.then(reader => {
    let partialData = '';

    // Read and process the NDJSON response
    return reader.read().then(function processResult(result) {
        if (result.done) {
            return;
        }

        partialData += new TextDecoder().decode(result.value, { stream: true });
        const lines = partialData.split('\n');

        for (let i = 0; i < lines.length - 1; i++) {
            const json = JSON.parse(lines[i]);
            fetchedData.push(json); // Store the parsed JSON object in the array
        }

        partialData = lines[lines.length - 1];

        return reader.read().then(processResult);
    });
})
.then(() => {
    // At this point, fetchedData contains all the parsed JSON objects
    console.log(fetchedData);
})
.catch(error => {
    console.error('Fetch error:', error);
});
Emre Bener
  • 681
  • 3
  • 15