0

I am trying to get all words that begin with a letter from a long string. How would you do this is java? I don't want to loop through every letter or something inefficient.

EDIT: I also can't use any in built data structures (except arrays of course)- its for a cs class. I can however make my own data structures (which i have created sevral).

Yahya Uddin
  • 26,997
  • 35
  • 140
  • 231
  • If it's any consolation, looking at every character in the string cannot be avoided (since you don't know *a priori* where the spaces are). – NPE Mar 14 '14 at 07:55
  • 1
    Lets see your solution / attempt (code) and worry about efficiency afterwards. Also, define "long". – reto Mar 14 '14 at 07:55
  • I think this could help: 1. Split http://stackoverflow.com/questions/3481828/how-to-split-a-string-in-java 2. Then Check for "letter" http://stackoverflow.com/questions/4450045/difference-between-matches-and-find-in-java-regex – Bjego Mar 14 '14 at 07:56
  • you could match a regex pattern like `\\be\\w*\\b` will match words that start with `e` – donfuxx Mar 14 '14 at 08:00

7 Answers7

2

You could try obtaining an array collection from your String and then iterating through it:

String s = "my very long string to test";

for(String st : s.split(" ")){
    if(st.startsWith("t")){
        System.out.println(st);
    }
}
Levenal
  • 3,796
  • 3
  • 24
  • 29
2

You need to be clear about some things. What is a "word"? You want to find only "words" starting with a letter, so I assume that words can have other characters too. But what chars are allowed? What defines the start of such a word? Whitespace, any non letter, any non letter/non digit, ...?

e.g.:

String TestInput = "test séntènce îwhere I'm want,to üfind 1words starting $with le11ers.";
String regex = "(?<=^|\\s)\\pL\\w*";

Pattern p = Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS);

Matcher matcher = p.matcher(TestInput);
while (matcher.find()) {
    System.out.println(matcher.group());
}

The regex (?<=^|\s)\pL\w* will find sequences that starts with a letter (\pL is a Unicode property for letter), followed by 0 or more "word" characters (Unicode letters and numbers, because of the modifier Pattern.UNICODE_CHARACTER_CLASS).
The lookbehind assertion (?<=^|\s) ensures that there is the start of the string or a whitespace before the sequence.

So my code will print:

test
séntènce ==> contains non ASCII letters
îwhere   ==> starts with a non ASCII letter
I        ==> 'm is missing, because `'` is not in `\w`
want
üfind    ==> starts with a non ASCII letter
starting
le11ers  ==> contains digits

Missing words:

,to     ==> starting with a ","
1words  ==> starting with a digit
$with   ==> starting with a "$"
stema
  • 90,351
  • 20
  • 107
  • 135
0

You could build a HashMap -

HashMap<String,String> map = new HashMap<String,String>();

example -

ant, bat, art, cat

Hashmap
a -> ant,art
b -> bat
c -> cat

to find all words that begin with "a", just do

map.get("a")
Ankit Rustagi
  • 5,539
  • 12
  • 39
  • 70
0
Scanner scan = new Scanner(text); // text being the string you are looking in
char test = 'x'; //whatever letter you are looking for
while(scan.hasNext()){
   String wordFound = scan.next();
   if(wordFound.charAt(0)==test){
       //do something with the wordFound
   }
}

this will do what you are looking for, inside the if statement do what you want with the word

mig
  • 142
  • 7
0

You can use split() method. Here is an example :

String string = "your string";
String[] parts = string.split(" C");

 for(int i=0; i<parts.length; i++) {
   String[] word = parts[i].split(" ");

    if( i > 0 ) {
          // ignore the rest words because don't starting with C
      System.out.println("C" + word[0]); 
    }
else {    // Check 1st excplicitly
          for(int j=0; j<word.length; j++) {

        if ( word[j].startsWith("c") || word[j].startsWith("C"))
              System.out.println(word[j]); 
            }   
        }

     }

where "C" is you letter. Just then loop around the array. For parts[0] you have to check if it starts with "C". It was my mistake to start looping from i=1. The correct is from 0.

Stathis Andronikos
  • 1,259
  • 2
  • 25
  • 44
  • 1
    This is either too subtle, or completely wrong. (I suspect the latter.) If it's the former, please expand. – NPE Mar 14 '14 at 07:58
  • A test program will solve your question. Just test it! – Stathis Andronikos Mar 14 '14 at 07:59
  • This just splits everytime i see a c. I need to get all words begining with c – Yahya Uddin Mar 14 '14 at 08:01
  • this will just split at every letter `C` – donfuxx Mar 14 '14 at 08:04
  • I tested and it's ok now! – Stathis Andronikos Mar 14 '14 at 08:14
  • @StathisAndronikos: I did not vote, but the "fixed" version is still buggy. What if the first word begins with the letter "C"? What about lowercase "c"? Who said that the words are only delimited by spaces (this might be the case, but is not stated in the question)? – NPE Mar 14 '14 at 09:20
  • For the 1st word you have to check excplicitly & for the space you can replace it with any delimiter character you have. As the question states words exist into a large string, I think it's logical to assume an empty string as delimiter for separation. Anyway thanks for your remark about the 1st word. – Stathis Andronikos Mar 14 '14 at 09:41
0

You can get the first letter of the string and check with API method that if it is letter or not.

String input = "jkk ds 32";
String[] array = input.split(" ");
for (String word : array) {
    char[] arr = word.toCharArray();
    char c = arr[0];
    if (Character.isLetter(c)) {
        System.out.println( word + "\t isLetter");
    } else {
        System.out.println(word + "\t not Letter");
    }
}

Following are some sample output:

jkk  isLetter
ds   isLetter
32   not Letter
Gaurav Gupta
  • 4,586
  • 4
  • 39
  • 72
0

Regexp way:

public static void main(String[] args) {
    String text = "my very long string to test";
    Matcher m = Pattern.compile("(^|\\W)(\\w*)").matcher(text);
    while (m.find()) {
      System.out.println("Found: "+m.group(2));
    }
 }
Andrés Oviedo
  • 1,388
  • 1
  • 13
  • 28