1

I'm new to this website so please let me know if I did something wrong. I am working on a 6 degrees of Kevin Bacon project which takes an external CSV file and read all the data in an unweighted graph and allows the user who runs this project to find the shortest path distance from Kevin Bacon to another person. What I'm stuck on is figuring out how to correctly read in all the data from my CSV file because two out of four of the column contains entries that are in JSON format.

I appreciate anything that comes my way and feel free to ask me to elaborate if you need me to :)

I have tried to implement the JSON simple parser and I would like to stick with that since it was easy to install and its functions are fairly straightforward. The external CSV file os huge but here is what it looks like:

/*
movie_id,title,cast,crew
19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""credit_id"": ""5602a8a7c3a3685532001c9a"", ""gender"": 2, ""id"": 65731, ""name"": ""Sam Worthington"", ""order"": 0}, {""cast_id"": 3, 
*/

Here is what I've tried:

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileReader;
import java.util.Scanner;
import org.json.simple.parser.JSONParser;


public class MrBacon {



    public static void main(String[] args) throws Exception
    {
        // TODO Auto-generated method stub
        JSONParser parser = new JSONParser();

        if(args.length < 2)
        {
            throw new Exception("Input File Error");
        }
        Scanner reader = new Scanner(new FileInputStream(args[0]));

        int size = 5000;
        Graph graph = new Graph(size);

        try
        {
            BufferedReader br = new BufferedReader(new FileReader("tmdb_5000_credits.csv"));
            StringBuilder st = new StringBuilder();
            String title, line;
            String[] actors; 

            while((line = br.readLine())!= null)
            {
                   int col = 0;
                   char [] words = line.toCharArray();
                   for(int i = 0; i < words.length; i ++)
                   {   
                       if(words[i] == ',')
                       {
                          col++;

                       }

                       else if(words[i] = )
                       {

                       }


            }
        }
    }
        catch(Exception e)
        {
            e.printStackTrace();
        }
    }
}
Angela
  • 31
  • 8
  • There are many great existing libraries for parsing CSV files. They handle things like escaping/unescaping special characters, and other details you probably don't want to handle yourself. https://stackoverflow.com/questions/10462507/any-good-library-to-read-and-write-csv-files – dnault May 07 '19 at 22:15
  • @dnault thank you! I checked it out and I think openCSV would be a good choice but I don't know how to implement it considering some movies have commas in their names and I need to be able to ignore those yet handle the one separating columns – Angela May 07 '19 at 22:28

1 Answers1

1

If the data source is one and does not change, you could use this online CSV to JSON converter which has a handy "Parse JSON" functionality that will transform JSON values in the cast and crew columns into nested JSON objects.

You would then use JSON.simple, Gson or Jackson to parse pure JSON data.

For example,

movie_id,title,cast,crew
19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""credit_id"": ""5602a8a7c3a3685532001c9a"", ""gender"": 2, ""id"": 65731, ""name"": ""Sam Worthington"", ""order"": 0}]",null

becomes:

[
  {
    "movie_id": 19995,
    "title": "Avatar",
    "cast": [
      {
        "cast_id": 242,
        "character": "Jake Sully",
        "credit_id": "5602a8a7c3a3685532001c9a",
        "gender": 2,
        "id": 65731,
        "name": "Sam Worthington",
        "order": 0
      }
    ],
    "crew": null
  }
]

If this is not feasible, then you might tell the CSV parsing library to ignore the delimiter when it is found inside quotes, for example.

If using Opencsv, look at the CSVParserBuilder class. It has a #withIgnoreQuotations(boolean) method which might do the job. The following is taken from the description of the CSVReaderBuilder class.

CSVParser parser = new CSVParserBuilder()
        .withSeparator(',')
        .withQuoteChar('"')
        .withIgnoreQuotations(true)
        .build();
CSVReader reader = new CSVReaderBuilder(new FileReader("tmdb_5000_credits.csv"))
        .withSkipLines(1)
        .withCSVParser(parser)
        .build();

Personally I like the Jackson library. It supports JSON out of the box, and can be extended to support many other formats such as YAML and CSV.

spongebob
  • 8,370
  • 15
  • 50
  • 83