1

I try to read in a tab-seperated TSV-File in Java and want to store the values per line in 2 variables. (variable name: everything before the tab, variable 2: everything after the tab). The file looks like this:

Name1 Lastname1 TAB directory1/subdir1/subdir11
Name2 SecondName2 Lastname2 TAB directory1/subdir2/subdir22

So i have 1) Names and Last Names, seperated by Space 2) TAB 3) url without blank spaces 4) new line (after the last url-character, so that the next entry starts in a new line)

I followed a tutorial and what i already have is:

// Open TSV File
public static Scanner openFile(String path) {
    try {
        Scanner scan;
        scan = new Scanner(new File(path)); 
        System.out.println("TSV-File found");
        return scan;
    } catch (Exception e) {
    System.out.println("TSV-File not found");
    }
    return null;
}   

public static void readFile(Scanner scan) {
    while(scan.hasNext()) { 
        String name = scan.next();
        String url = scan.next();
        System.out.printf("%s %s\n", name, url);
    }
}

The problem is in my readFile() Method, because I do not know how to to say "take everything before tab and store it to variable name" and "take everything from tab to new line and store it to variable url".

Thanks and greetings, Patrick

AtMakeIT
  • 95
  • 1
  • 8

2 Answers2

1

String::split

I do not know how to to say "take everything before tab and store it to variable name" and "take everything from tab to new line and store it to variable url".

Use the String::split method to chop the string into smaller strings. Specify the delimiter (TAB) used between fields on each line. You get back an array of String objects, one for each field of the line.

String[] fields = line.split( "\t" ) ;    // Chop string into smaller strings.
String name = fields[ 0 ] ;               // Annoying zero-based index counting.
String url = fields[ 1 ] ;

You should add some code to verify you got the expected number of fields in the array size.

Tip: Use a library to perform the chore of reading and writing your Tab-delimited files. I use the Apache Commons CSV library for such work. It handles a variety of CSV formats as well as Tab-delimited. Search Stack Overflow for examples, such as one I posted yesterday. In that example code, change the CSVFormat.RFC4180 to CSVFormat.TDF for Tab-delimited format.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
0

I would use a different approach - BufferedReader. With it you can get a stream of lines of the file and work with every line separately.

import java.io.*;

public class App {

    public static void main(String[] args) {
        try (BufferedReader reader = new BufferedReader(new FileReader("data.tsv"))) {
            reader.lines()                     // Get a stream of lines
                .map(line -> line.split("\t")) // Split every line by the tab character
                .forEach(App::doStuff);        // Call doStuff for every tokenized line
        } catch (IOException e) {
            System.out.println("Cannot open the file.");
        }
    }

    static void doStuff(String[] tokens) {
        if (tokens.length != 2) {
            throw new IllegalArgumentException("Cannot do stuff with an invalid line.");
        }

        String name = tokens[0]; // tokens[0] contains everything before the tab character
        String url = tokens[1];  // tokens[1] contains everything after the tab character

        System.out.printf("%s %s\n", name, url);
    }
}

If you really want to use a Scanner, you can specify the delimiters:

scan = new Scanner(new File(path)).useDelimiter("[\n\t]");

This will make the scanner use only the tab and newline characters as delimiters. Note that this means it's not required that the file format be exacly like this: 'name TAB url NEWLINE name TAB url'. It can also be 'name NEWLINE url TAB name TAB url'. This is because Scanner doesn't care about the order of the delimiters.

If you really really want to use a Scanner and preserve a strict format, you can use 2 scanners. Scan a line with the first one, then scan the name and url from the line. But I think it's too complicated and I'd rather use BufferedReader as it does preserve the strict format.

Tolik Pylypchuk
  • 680
  • 1
  • 6
  • 15