5

I got the following string:

 String line = "#food was testy. #drink lots of. #night was fab. #three #four";

I want to take #food #drink #night #three and #four from it.

I tried this code:

    String[] words = line.split("#");
    for (String word: words) {
        System.out.println(word);
    }

But it gives food was testy, drink lots of, nigth was fab, three and four.

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
Devendra Singh
  • 2,343
  • 4
  • 26
  • 47

2 Answers2

17

split will only cuts the whole string at where it founds a #. That explain your current result.

You may want to extract the first word of every pieces of string, but the good tool to perform your task is RegEx

Here how you can achieve it:

String line = "#food was testy. #drink lots of. #night was fab. #three #four";

Pattern pattern = Pattern.compile("#\\w+");

Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
    System.out.println(matcher.group());
}

Output is:

#food
#drink
#night
#three
#four

The magic happen in "#\w+".

So we search for stuff starting with # followed by one or more letter, number or underscore.

We use '\\' for '\' because of Escape Sequences.

You can play with it here.

find and group are explained here:

  • The find method scans the input sequence looking for the next subsequence that matches the pattern.
  • group() returns the input subsequence matched by the previous match.

[edit]

The use of \w can be an issue if you need to detect accented characters or non-latin characters.

For example in:

"Bonjour mon #bébé #chat."

The matches will be:

  • #b
  • #chat

It depends on what you will accept as possible hashTag. But it is an other question and multiple discussions exist about it.

For example, if you want any characters from any language, #\p{L}+ looks good, but the underscore is not in it...

Community
  • 1
  • 1
Orace
  • 7,822
  • 30
  • 45
  • This is working fine. but how can i get the Matched word in variable? – Devendra Singh Apr 03 '15 at 09:48
  • String mtch=matcher.group().toString(); got it using this. :) Thank you alot. Both of You. @Orace and @ Jitesh Ji. – Devendra Singh Apr 03 '15 at 09:53
  • You did not need the `toString` as the result of `group()` is already a string. Also you have multiple results, you may need to use a container to put them all in it. I will edit my code. – Orace Apr 03 '15 at 09:58
  • you said \w matched any letter than why it doesn't run till after the whitespace.? – Devendra Singh Apr 03 '15 at 09:59
  • i have an arraylist so can i add like this tag_list.add(matcher.group())? and when if i extract will for-each work? because i have to put the matched words in Sqlite. so i need to do that? – Devendra Singh Apr 03 '15 at 09:59
  • The solution indeed is: `while (matcher.find()) { tag_list.add(matcher.group()); }` then you can use the `tag_list`container to retrieve all the tags (with a for-each). – Orace Apr 03 '15 at 10:02
  • i done that. Thanks Senior @Orace. i will need such helps further wherever i stuck. :) – Devendra Singh Apr 03 '15 at 10:04
  • Ask a new question if you need help. I think this one is complete. – Orace Apr 03 '15 at 10:07
  • http://stackoverflow.com/questions/29435265/accessing-phones-gallery-path-to-create-bitmap-null-pointer – Devendra Singh Apr 03 '15 at 15:40
  • http://stackoverflow.com/questions/29485531/how-can-i-use-regex-to-get-data-from-sqlite – Devendra Singh Apr 07 '15 at 07:01
  • It does not include @ character in the word. For example, '#my@email.com is my email'. The regex only returns '#my' instead I want it to return complete '#my@email.com'. Any help – ZaEeM ZaFaR Jan 27 '21 at 09:06
  • @ZaEeMZaFaR try this pattern: "#[.@\\w]+" and look at this question: https://stackoverflow.com/questions/201323/how-to-validate-an-email-address-using-a-regular-expression – Orace Jan 27 '21 at 09:50
-1

Please follow the procedure to do ==>

   String candidate = "#food was testy. #drink lots of. #night was fab. #three #four";

        String regex = "#\\w+";
        Pattern p = Pattern.compile(regex);

        Matcher m = p.matcher(candidate);
        String val = null;

        System.out.println("INPUT: " + candidate);

        System.out.println("REGEX: " + regex + "\r\n");

        while (m.find()) {
          val = m.group();
          System.out.println("MATCH: " + val);
        }
        if (val == null) {
          System.out.println("NO MATCHES: ");
        }

which will give output as follows as i solved the problem at my netbeans IDE and tested the program

INPUT: #food was testy. #drink lots of. #night was fab. #three #four

REGEX: #\w+

MATCH: #food

MATCH: #drink

MATCH: #night

MATCH: #three

MATCH: #four

you will need the following imports

import java.util.regex.Matcher;
import java.util.regex.Pattern;
Jitesh Upadhyay
  • 5,244
  • 2
  • 25
  • 43