I have a .CSV data file with a ton, and I mean a TON (80+ million lines) of data.
The data is all in two columns, and looks like the following:
src | dst
123123 | 456456
321321 | 654654
987987 | 789789
123123 | 456456
and so on for 80 million lines.
(note: I know that the delimiter should be a ',' in a .CSV, but in this case it's an '|' . The file extension is still .CSV)
I'm trying to figure out how to write a program that will read in all the data, and print out the number of repeated values in the 'src' field. For example, in my example, the output would look like '123123: showed up 2 times'
I've tried a few solutions, most notably this: How to read the csv file properly if each row contains different number of fields (number quite big)?
I wrote a loop to split the 'src' from the 'dst' with 'newData' being the .CSV file
//go through each line and split + link the data to src/dst
data.forEach(function (line) {
newData = line.split('|'); //note, split returns an array
let src = newData[0]; //src from data.csv
let dst = newData[1]; //dst from data.csv
//test print the data
//console.log(newData);
});
But am having issues getting a count duplicate values from the newData[0] (src) column.