2

It is possible to improve the performance of the following through a regular expression, the code is functional, but I want to know if there is any way to select the possible dash that exist in the unicode to standardize my dash

Words:

48553−FS002
48553-FS002
48553 FS002
48553-FS002-ESD12

Java

String reference = "48553−FS002";
String separador = reference.replaceFirst ( "\\w+(\\W)?\\w+", "$1" );
if(!separator.equals ( " " )) {
   reference = reference.replaceAll ( separator, "-" );
}

Or you could search for the unicode code, I was reading the following: dash, but i haven't managed to make it work Java Regex Unicode

Gdaimon
  • 269
  • 1
  • 3
  • 16

2 Answers2

2

If you need to match any non-word but space, you may use

reference = reference.replaceAll("[^\\w ]", "-");

Or, with character class subtraction:

reference = reference.replaceAll("[\\W&&[^ ]]", "-");

You can use the following pattern to match your hyphen or dash like patterns:

[\p{Pd}\u00AD\u2212]

Here,

  • \p{Pd} - matches any Punctuation, Dash symbols
  • \u00AD - matches a soft hyphen
  • \u2212 - matches a minus symbol.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

If you know your strings only contain word characters and separators, as seems to be the case, then you can just use

reference = reference.replaceAll("[^ \\w]", "-");
MikeM
  • 13,156
  • 2
  • 34
  • 47