0

for a class assignment, I'm using data from https://www.kaggle.com/shivamb/netflix-shows which has presented a small problem for me: it is a CSV, however, the cast variable was also separated by commas affecting the .split function I was using. the data has a set of [value, value, value," value,value ", value, ...]. the goal is to exclude the values within the " ".

currently to run this function I have:

while ( inFile.hasNext() ){
               String delims = "[,]";                               //Delimiters for seperation
               String[] tokens =  inFile.nextLine().split(delims);  // seperation operator put in to string array
                for (String token : tokens) {
                    System.out.println(token);
                }
  • 1
    Hi, please edit the question and show the example of the problematic csv data. The link you provide requires a login to download the file, which nobody is going to do here. – OldProgrammer May 03 '21 at 21:40
  • The set of [value, value, etc...] is an example of the dataset, the parts in italic are the ones that should not be included they are denoted by " which I also bolded for easier reading – Jonathan Starz May 05 '21 at 07:56

2 Answers2

0

Because it's a class assignment, I would simple just code the logic. For each character decide if you want to add it to a current word or if a new word has to start. So its pretty easy to store if you are in the " " and react on this..

something like this

public List<String> split(String line)
  {
    List<String> result = new ArrayList<>();
    String currentWord = "";
    boolean inWord = false;
    for (int i = 0; i < line.length(); i++)
    {
      char c = line.charAt(i);
      if (c == ',' && !inWord)
      {
        result.add(currentWord.trim());
        currentWord = "";
        continue;
      }
      if (c == '"')
      {
        inWord = !inWord;
        continue;
      }
      currentWord += c;
    }
    return result;
  }

there are some hard core regular expressions like here: Splitting on comma outside quotes but I would not use them in an assignment.

Andreas Radauer
  • 1,083
  • 7
  • 18
  • The page you linked was essentially the same problem, hadn't found it... (as for the assignment, I was looking for something concise to use in delimits. It is allowed in our class) – Jonathan Starz May 05 '21 at 09:09
0

I'm sure there is a simpler way of doing this but this is one solution I came up with.

    while ( inFile.hasNext() ) {
        int quote = 0;
        String delims = "[,]"; //Delimiters for seperation
        String[] tokens =  inFile.nextLine().split(delims);
         for (String token : tokens) {
             if(token.contains("\"")) { //If contains a quote
                 quote++; //Increment quote counter
             }
             if (quote != 1) //If not between quotes
             {
                 if(token.indexOf(" ") == -1) //Print if no space at beginning
                 {
                    System.out.println(token);
                 }
                 else { //Print from first character
                    System.out.println(token.substring(token.indexOf(" ") + 1));
                 }  
             }
         }
    }

    inFile.close();
Anthony
  • 321
  • 2
  • 10
  • This also depends on the data you're working with. If there each "value" has a newline this wouldn't work. Only if the data is on the same line which I assumed from the description. – Anthony May 03 '21 at 21:49