I'm developing a CSV parser that should be able to deal with huge datasets (read 10 million rows) in the browser.
Basically the parser works as follows:
Main thread reads chunk of 20MB, otherwise the browser would crash quickly. After that, sends the chunk of data read to one of the workers.
The worker receives the data and discards the columns I don't want and saves the ones I want. Normally I only want 4-5 columns out of 20-30.
The worker sends the processed data back to the main thread.
The main thread receives the data and saves it in the data array.
Repeat steps 1-4 until file is done.
At the end with the dataset (crimes city of chicago), I end up with an array that has inside of it 71 other arrays and each of these arrays contains +/- 90K elements. Each of these 90K elements contains 5 strings (columns that were taken from the read file). Namely latitude, longitude, year, block and IUCR.
Summarizing, 71 is the number of chunks of 20MB in the dataset, 90K is the number of rows in each chunk of 20MB and 5 is the columns that were extracted.
I noticed that the browser (Chrome) was using too much memory, so I tried in 4 different browsers (Chrome, Opera, Vivaldi and Firefox), and recorded the memory used by the tab.
- Chrome - 1.76GB
- Opera - 1.76GB
- Firefox - 1.3GB
- Vivaldi - 1GB
If I try to recreate the same array but with mock data, it only uses approx. 350MB of memory.:
var data = [];
for(let i = 0; i < 71; i++){
let rows = [];
for(let j = 0; j < 90*1000; j++){
rows.push(["029XX W MADISON ST", "2027", "-87.698850575", "2001", "41.880939487"])
}
data.push(rows);
}
I understand that if the array is static, as seen in the code above, it's easier to perform better than the dynamic case. But I wasn't expecting to use 5 times more memory for the same quantity of data.
There's anything I can do to use less memory on the parser?