1

I am attempting to print specific data from a dat file, in this case, this file holds data on multiple movies. How would I print the data of only the id and title column?

Contents of movie.dat file

id  title       imdbID      SpanishTitle
1   Toy story   0114709     juguetes
2   Jumanji     0113497     jumanji
3   Grumpy Old Men  0107050     Dos viejos grunoes
4      

My attempt in code

import java.io.*; 

public class ReadFromFile2 { 
    public static void main(String[] args)throws Exception {  
        File file = new File("movies.dat");  
        BufferedReader br = new BufferedReader(new FileReader(file)); 

        try {
            String line; 
            while ((line = br.readLine()) != null){
                String[] var = line.split(" ");

                System.out.println(line); 

            } 
        } finally {
            br.close();
        }
    }       
}
NewUser
  • 31
  • 4
  • Are those tab characters between columns? If so split the string on tab characters. https://stackoverflow.com/questions/3481828/how-to-split-a-string-in-java If they are spaces then you are probably going to need a tricky regex. – Bakon Jarser Jul 17 '19 at 23:40
  • this is a bit trickier than I thought, because Grumpy Old and Men all get split up. Are all IMDB id's 7 digits? – Jeremy Kahan Jul 17 '19 at 23:52
  • Yes they are all of 7 digits – NewUser Jul 17 '19 at 23:57
  • 1
    @BakonJarser they are spaces, so one needs to split on the regex "\\s+" like you thought. With the assumption that a 7 digit number is an IMDB (and not a word in a movie title), I was able to know when the title ended and something else began. – Jeremy Kahan Jul 18 '19 at 01:11
  • it's not what you asked, but probably it would be goofd to have a catch block – Jeremy Kahan Jul 18 '19 at 03:52

1 Answers1

1

You needed to split on multiple spaces, not just one. Also, it took some thought to know where a title with spaces ended and the next column began. I used features of the imdb being a 7 digit number. I also took some care not to get flummoxed by that short last line or by the header line. Here is how it came out.

import java.io.*;

public class ReadFromFile2 {
    public static void main(String[] args) throws Exception {
        File file = new File("movies.dat");
        BufferedReader br = new BufferedReader(new FileReader(file));
        boolean first = true;
        try {
            String line;
            while ((line = br.readLine()) != null) {
                line = line.trim(); //get rid of leading and trailing spaces to be safe
                String[]
                var = line.split("\\s+"); //split on space or multiple spaces
                if (first) {
                    System.out.println(var [0] + " " +
                        var [1]); //headers
                    first = false;
                } else if (var.length >= 2) { //enough to write

                    for (int i = 0; i <
                        var.length; i++) {
                        if (var [i].matches("[0-9]+") &&
                            var [i].length() == 7) { //imdb
                            break;
                        }
                        if (i > 0) {
                            System.out.print(" ");
                        }
                        System.out.print(var [i]);
                    }
                    System.out.println();
                }

            }
        } finally {
            br.close();
        }
    }
}
Jeremy Kahan
  • 3,796
  • 1
  • 10
  • 23
  • 1
    Way too much work inside of the loop. Compile the pattern for the regex outside of the loop. Pattern idPattern = Pattern.compile("[0-9]+"); idPattern.matcher(i).matches(); – Bakon Jarser Jul 18 '19 at 15:41
  • nice idea @BakonJarser. I was not aware of the performance implication of the regex inside the loop. – Jeremy Kahan Jul 18 '19 at 18:06
  • 1
    I found a really good post on it once that explained all the differences between String.matches and precompiled patterns that I can't seem to find now. This one isn't as complete as the one I originally read but it's good: https://stackoverflow.com/questions/19829892/java-regular-expressions-performance-and-alternative – Bakon Jarser Jul 18 '19 at 23:10