1

While filtering from list of strings, i want to match consecutive single characters as whole word

e.g. below strings

'm g road'
'some a b c d limited'

in first case should match if user types

"mg" or "m g" or "m g road" or "mg road"

in second case should match if user types

"some abcd" or "some a b c d" or "abcd" or "a b c d"

How i can do that, can i achieve this using regex?

Order of whole words i can handle right now using searching words one by one, but not sure how to treat consecutive single chars as single word

e.g. "mg road" or "road mg" i can handle by searching "mg" and "road" one by one

EDIT

For making requirement more clear, below is my test case

@Test
public void testRemoveSpaceFromConsecutiveSingleCharacters() throws Exception {
    Assert.assertTrue(Main.removeSpaceFromConsecutiveSingleCharacters("some a b c d limited").equals("some abcd limited"));
    Assert.assertTrue(Main.removeSpaceFromConsecutiveSingleCharacters("m g road").equals("mg road"));
    Assert.assertTrue(Main.removeSpaceFromConsecutiveSingleCharacters("bank a b c").equals("bank abc"));
    Assert.assertTrue(Main.removeSpaceFromConsecutiveSingleCharacters("bank a b c limited n a").equals("bank abc limited na"));
    Assert.assertTrue(Main.removeSpaceFromConsecutiveSingleCharacters("c road").equals("c road"));
}
Akhil
  • 6,667
  • 4
  • 31
  • 61
  • You can strip out spaces within space-surrounded single letters by `.replaceAll("(?<=\\b\\w) +(?=\\w\\b)","")` [like in this demo](http://fiddle.re/9kkdz6) (click Java). Do this for both: stringtocheck and userinput. Check if [stringtocheck .contains userinput](http://stackoverflow.com/a/2275035/5527985). – bobble bubble Jan 31 '16 at 18:18
  • 1
    @bobblebubble yes, it worked.. Thanks, can you add it as answer – Akhil Feb 01 '16 at 06:20

6 Answers6

1

Sounds like you simply want to ignore white space. You can easily can do this by stripping out white space from both the target string and the user input before looking for a match.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • yeah that can work.. but i want to match "road mg" and "road m g" also in first case.. means i want to treat consecutive single characters as one word – Akhil Jan 31 '16 at 14:53
  • @Akhil - Ah. You didn't raise the possibility of out-of-order words in your initial post. So user input "road mg" should match, but how about "road g m"? Or even "road gm"? – Ted Hopp Jan 31 '16 at 18:53
  • Sorry for not clear requirements, i have added my test case now in question – Akhil Feb 01 '16 at 05:53
  • @Akhil - Your test cases have no example of reordering words in the way you described in your first comment here. The requirements are still not clear. What about my examples of "road g m" or "road gm"? – Ted Hopp Feb 01 '16 at 07:06
  • Yeah, that i am handling in separate logic once i have string like this "mg road" or "road mg" i will search for "mg" and "road" separately, i mentioned this in my post. Regex suggested by @bobblebubble is working for me for mentioned test cases, he mentioned in comments section .replaceAll("(?<=\\b\\w) +(?=\\w\\b)","") – Akhil Feb 01 '16 at 07:18
1

You're basically wanting each search term to be modified to allow intervening spaces, so

"abcd" becomes regex "\ba ?b ?c ?d\b"

To achieve this, do this to each word before matching:

word = "\\b" + word.replaceAll("(?<=.)(?=.)", " ?") + "\\b";

The word breaks \b are necessary to stop matching "comma bcd" or "abc duck".

Bohemian
  • 412,405
  • 93
  • 575
  • 722
1

This regex will match all single characters separated by one or more spaces

(^(\w\s+)+)|(\s+\w)+$|((\s+\w)+\s+)
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Ghayth
  • 894
  • 2
  • 7
  • 18
1

1.) Strip out spaces within space-surrounded single letters from stringtocheck and userinput.

.replaceAll("(?<=\\b\\w) +(?=\\w\\b)","")
  • (?<=\b\w) look behind to check if preceded by \b word boundary, \w word character
  • (?=\\w\\b) look ahead to check if followed by \w word character, \b word boundary

See demo at regexplanet (click Java)

2.) Check if stringtocheck .contains userinput.

Community
  • 1
  • 1
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

The following regex (in multiline mode) could help you out:

^(?<first>\w+)(?<chars>(?:.(?!(?:\b\w{2,}\b)))*)
# assure that it is the beginning of the line
# capture as many word characters as possible in the first group "first"
# the construction afterwards consumes everything up to (not including)
# a word which has at least two characters...
# ... and saves it to the group called "chars"

You would only need to replace the whitespaces in the second group (aka "chars").
See a demo on regex101.com

Jan
  • 42,290
  • 8
  • 54
  • 79
-1
str = str.replaceAll("\\s","");
anaxin
  • 710
  • 2
  • 7
  • 16
  • i have other requirement that ordering of words in search term, please check last portion of question – Akhil Jan 31 '16 at 16:00