2

Here is my basic problem: I am reading some lines in from a file. The format of each line in the file is this:

John Doe    123

There is a tab between Doe and 123.

I'm looking for a regex such that I can "pick off" the John Doe. Something like scanner.next(regular expression) that would give me the John Doe.

This is probably very simple, but I can't seem to get it to work. Also, I'm trying to figure this out without having to rely on the tab being there.

I've looked here: Regular Expression regex to validate input: Two words with a space between. But none of these answers worked. I kept getting runtime errors.

Some Code:

while(inFile.hasNextLine()){
    String s = inFile.nextLine();
    Scanner string = new Scanner(s);
    System.out.println(s); // check to make sure I got the string
    System.out.println(string.next("[A-Za-z]+ [A-Za-z]+")); //This  
                                                //doesn't work for me
    System.out.println(string.next("\\b[A-Za-z ]+\\b"));//Nor does
                                                               //this
 }
Community
  • 1
  • 1
user678392
  • 1,981
  • 3
  • 28
  • 50
  • Have you got some code we could work with? – Bob Kuhar Feb 14 '12 at 05:42
  • (John).+(Doe) - http://docs.oracle.com/javase/tutorial/essential/regex/index.html – Brian Roach Feb 14 '12 at 05:44
  • possible duplicate of [String parsing in Java with delimeter tab "\t" using split](http://stackoverflow.com/questions/1635764/string-parsing-in-java-with-delimeter-tab-t-using-split) – Daniel Haley Feb 14 '12 at 05:45
  • Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html – Dapeng Feb 14 '12 at 07:35

4 Answers4

1

Are you required to use regex for this? You could simply use a split method across \t on each line and just grab the first or second element (I'm not sure which you meant by 'pick off' john doe).

Tim
  • 14,447
  • 6
  • 40
  • 63
0

It would help if you provided the code you're trying that is giving you runtime errors.

You could use regex:

[A-Za-z]+ [A-Za-z]+

if you always knew your name was going to be two words.

You could also try

\b[A-Za-z ]+\b

which matches any number of words (containing alphabets), making sure it captures whole words (that's what the '\b' is) --> to return "John Doe" instead of "John Doe " (with the trailing space too). Don't forget backslashes need to be escaped in Java.

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • So I couldn't get either of these work. The first throws an exception, and the second only gets either the first or second word only (I don't remember which). – user678392 Feb 14 '12 at 05:58
0

This basically works to isolate John Doe from the rest...

public String isolateAndTrim( String candidate ) {
    // This pattern isolates "John Doe" as a group...
    Pattern pattern = Pattern.compile( "(\\w+\\s+\\w+)\\s+\\d*" );
    Matcher matcher = pattern.matcher( candidate );
    String clean = "";
    if ( matcher.matches() ) {
        clean = matcher.group( 1 );
        // This replace all reduces away extraneous whitespace...
        clean = clean.replaceAll( "\\s+", " " );
    }
    return clean;
}

The grouping parenthesis will allow you to "pick off" the name portion from the digit portion. "John Doe", "Jane Austin", whatever. You should learn the grouping stuff in RegEx as it works great for problems just like this one.

The trick to remove the extra whitespace comes from How to remove duplicate white spaces in string using Java?

Community
  • 1
  • 1
Bob Kuhar
  • 10,838
  • 11
  • 62
  • 115
0

Do you prefer simplicity and readability? If so, consider the following solution

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class MyLineScanner
{

    public static void readLine(String source_file) throws FileNotFoundException
    {
        File source = new File(source_file);
        Scanner line_scanner = new Scanner(source);

        while(line_scanner.hasNextLine())
        {
            String line = line_scanner.nextLine();

            // check to make sure line is exists;
            System.out.println(line); 

            // this work for me             
            Scanner words_scanner = new Scanner(line);
            words_scanner.useDelimiter("\t");           

            while (words_scanner.hasNext())
            {
                System.out.format("word : %s %n", words_scanner.next());
            }
        }

    }



    public static void main(String[] args) throws FileNotFoundException
    {
        readLine("source.txt");

    }

}
Jasonw
  • 5,054
  • 7
  • 43
  • 48