1

Hi we have a pattern applied and characters replaced, could you please let us know what charecters are replaced with this below regex. It is getting hard to get to know. I wanted the list of characters as the regEx applied here. Need help!! Please find below sample:

private String testRegEx(String myStr) {
    String regex = "[^\\s\\p{L}\\p{N}']|(?<=(^|\\s))'|'(?=($|\\s))";
    Pattern pattern = Pattern.compile(regex);       
    if(StringUtils.isNotEmpty(myStr)) {
        String firstString = myStr.replaceAll("\\r|\\n\"\'\"", "").replace("~^1~^", "").replaceAll("\\*", "").replaceAll("\\.", "");
        String res = pattern.matcher(firstString).replaceAll("");
        return StringUtils.normalizeSpace(res);
    } else {
        return StringUtils.EMPTY;
    }
}
Jacob G.
  • 28,856
  • 5
  • 62
  • 116
sat1219
  • 25
  • 1
  • 5

3 Answers3

1

If all you want to do is know what was replaced. I suggest a quick debugging idea instead of trying to figure out manually.

  1. Place a System.out.println(mhStr) as the first line of the method.
  2. replace return with System.out.println(....) at the end of the if-statements; don't forget to change the method signature to return null. So, private void testRegEx(String myStr) {
  3. Check console output to see the two lines that were printed and compare them. You'll know immediately what happened.
  4. Revert the changes with any modification you wish to make with your code.

EDIT:

  1. In step two, simply System.out.println(...) and let your code as is. Thanks to someone who pointed it out below. Much better way to do it.

So, for example, if you wish to compare after StringUtils.normalizeSpace(res) then before that line do System.out.println(StringUtils.normalizeSpace(res)); after point 1.

spinyBabbler
  • 392
  • 5
  • 19
  • Because that's the one he wants too look at right? If not, print statement to whatever statement he wants to compare with and not do the following steps. – spinyBabbler Mar 08 '19 at 16:07
  • 1
    Sorry it was unclear. I mean, you can add the print at the end of the method without impacting the rest of the program. No need to return null nor to change the method signature. – vincrichaud Mar 08 '19 at 16:09
  • 1
    Good point. That was a silly redundant extra step. Edited the answer. – spinyBabbler Mar 08 '19 at 16:13
  • Hey Thanks a lot for quick turn around. Sure I will try with those above steps. Also I was trying to understand the meaning of that regEx . It was really too hard in figuring out the characters :( – sat1219 Mar 08 '19 at 16:21
0

I suggest you to use debuggex to find out how your regex works. You'll need to replace the double '\\' for '\' in order to use the regex editor properly. Than you'll see the regex path drawn.

R. Karlus
  • 2,094
  • 3
  • 24
  • 48
  • I tried in debuggex , replaced \\ with single \ , but gives an error - Uexpected character "<" after "?" and later if i remove "<" then it gave the regex path drawn. Not sure with "<" Can you please help with this – sat1219 Mar 08 '19 at 17:10
  • Actually I can't figure out the using of **<**. Sorry, but removing it from the regex, gives you the same result. – R. Karlus Mar 08 '19 at 17:35
0

With all trial and error, as mentioned by @SpinyBabbler got to know some characters, so to avoid all confusions, we went with replacing only non-ascii characters following this link: Replace non ASCII character from string

and finally it worked with this pattern: str1.replaceAll("[^\x00-\x7F]|[\u0001]", "");

Thank you for all support provided.

sat1219
  • 25
  • 1
  • 5
  • Terminology: `String` doesn't have ASCII characters; It has Unicode characters. You are removing all that are not in the [C0 Controls and Basic Latin](http://www.unicode.org/charts/nameslist/index.html) block, as well as U+0001 (␁). – Tom Blodget Mar 13 '19 at 00:39