I have a CSV file of nearly 2 million rows with 3 columns (item, rating, user). I am able to transfer the data into a 2D String array or list. However, my issue arises when I am trying to parse through the arrays to create CSV files from because the application stops and I do not know how long I am expected to wait for the program to finish running.
Basically, my end goal is to be able to parse through large CSV file, create a matrix in which each distinct item represents a row and each distinct user represents a column with the rating being at the intersection of the user and item. With this matrix, I then create a cosine similarity matrix with the rows and columns represented by items with their cosine similarity being at the intersection of the two distinct items.
I already know how to create CSV files, but my issue falls within the large loop structures when creating other arrays for the purposes of comparison.
Is there a better way to be able to process and calculate large amounts of data so that my application doesn't freeze?
My current program does the following:
- Take large CSV file
- Parse through large CSV file
- Create 2D array resembling original CSV file
- Create list of distinct items (each distinct item being represented by an index number)
- Create list of distinct users (each distinct user being represented by an index number)
- Create 2D array of with row indexes representing items, column indexes representing users resulting in array[row][column] = rating
- Calculate cosine similarity of two matrices
- Create 2D array with both row and column indexes representing items resulting in array[row] [column] = cosine similarity
I noticed that my program freezes when it reaches steps 4 and 5 If I remove steps 4 and 5, it will still freeze at step 6
I have attached that portion of my code
FileInputStream stream = null;
Scanner scanner = null;
try{
stream = new FileInputStream(fileName);
scanner = new Scanner(stream, "UTF-8");
while (scanner.hasNextLine()){
String line = scanner.nextLine();
if (!line.equals("")){
String[] elems = line.split(",");
if (itemList.isEmpty()){
itemList.add(elems[0]);
}
else{
if (!itemList.contains(elems[0]))
itemList.add(elems[0]);
}
if (nameList.isEmpty()){
nameList.add(elems[2]);
}
else{
if (!nameList.contains(elems[2]))
nameList.add(elems[2]);
}
for (int i = 0; i < elems.length; i++){
if (i == 1){
if (elems[1].equals("")){
list.add("0");
}
else{
list.add(elems[1]);
}
}
else{
list.add(elems[i]);
}
}
}
}
if (scanner.ioException() != null){
throw scanner.ioException();
}
}
catch (IOException e){
System.out.println(e);
}
finally{
try{
if (stream != null){
stream.close();
}
}
catch (IOException e){
System.out.println(e);
}
if (scanner != null){
scanner.close();
}
}