How would you translate this JavaScript regex to Java?
It removes punctuation from a string:
strippedStr = str.replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()]/g,"");
How would you translate this JavaScript regex to Java?
It removes punctuation from a string:
strippedStr = str.replace(/[\.,-\/#!$%\^&\*;:{}=\-_`~()]/g,"");
Don't need the s///
stuff in there, you just pass the regular expression, and here it's a character class.
public static void main(String [] args)
{
String s = ".,/#!$%^&*;:{}=-_`~()./#hello#&%---#(($";
String n = s.replaceAll("[-.,/#!$%^&*;:{}=_`~()]", "");
System.out.println(n); // should print "hello"
// using POSIX character class which matches any of:
// !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
String p = s.replaceAll("\\p{Punct}", "");
System.out.println(p);
}
Syntax for Java Regular Expressions here: http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#sum
The POSIX character class used above covers a little more than you have, so I'm not sure if it fits your needs.
.
in the character class.-
to beginning of character class so it has no special meaningIf you expect this to work with all punctuation instead of just ASCII, you need to use:
String new_string = old_string.replaceAll("[\\pS\\pP]", "");
That’s because some of the things you are calling punctuation are actually symbols, as revealed by my uniprops script:
$ uniprops - \\ . , / '#' ! '$' % ^ '&' '*' ';' : { } = _ '`' '~' '(' ')'
U+002D ‹-› \N{ HYPHEN-MINUS }:
\pP \p{Pd}
All Any ASCII Assigned Common Zyyy Dash Dash_Punctuation Pd P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+005C ‹\› \N{ REVERSE SOLIDUS }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002E ‹.› \N{ FULL STOP }:
\pP \p{Po}
All Any ASCII Assigned Case_Ignorable CI Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation STerm Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002C ‹,› \N{ COMMA }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002F ‹/› \N{ SOLIDUS }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0023 ‹#› \N{ NUMBER SIGN }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0021 ‹!› \N{ EXCLAMATION MARK }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation STerm Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0024 ‹$› \N{ DOLLAR SIGN }:
\pS \p{Sc}
All Any ASCII Assigned Common Zyyy Currency_Symbol Sc S Gr_Base Grapheme_Base Graph GrBase Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0025 ‹%› \N{ PERCENT SIGN }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+005E ‹^› \N{ CIRCUMFLEX ACCENT }:
\pS \p{Sk}
All Any ASCII Assigned Case_Ignorable CI Common Zyyy Dia Diacritic Sk S Gr_Base Grapheme_Base Graph GrBase Math Modifier_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0026 ‹&› \N{ AMPERSAND }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+002A ‹*› \N{ ASTERISK }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003B ‹;› \N{ SEMICOLON }:
\pP \p{Po}
All Any ASCII Assigned Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003A ‹:› \N{ COLON }:
\pP \p{Po}
All Any ASCII Assigned Case_Ignorable CI Common Zyyy Po P Gr_Base Grapheme_Base Graph GrBase Other_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation Term Terminal_Punctuation XPosixGraph XPosixPrint XPosixPunct
U+007B ‹{› \N{ LEFT CURLY BRACKET }:
\pP \p{Ps}
All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Common Zyyy Ps P Gr_Base Grapheme_Base Graph GrBase Open_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+007D ‹}› \N{ RIGHT CURLY BRACKET }:
\pP \p{Pe}
All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Close_Punctuation Pe Common Zyyy P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+003D ‹=› \N{ EQUALS SIGN }:
\pS \p{Sm}
All Any ASCII Assigned Common Zyyy Sm S Gr_Base Grapheme_Base Graph GrBase Math Math_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+005F ‹_› \N{ LOW LINE }:
\w \pP \p{Pc}
All Any ASCII Assigned Common Zyyy Connector_Punctuation Pc P Gr_Base Grapheme_Base Graph GrBase ID_Continue IDC Punct PerlWord PosixGraph PosixPrint PosixPunct PosixWord Print Punctuation Word XID_Continue XIDC XPosixGraph XPosixPrint XPosixPunct XPosixWord
U+0060 ‹`› \N{ GRAVE ACCENT }:
\pS \p{Sk}
All Any ASCII Assigned Case_Ignorable CI Common Zyyy Dia Diacritic Sk S Gr_Base Grapheme_Base Graph GrBase Modifier_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+007E ‹~› \N{ TILDE }:
\pS \p{Sm}
All Any ASCII Assigned Common Zyyy Sm S Gr_Base Grapheme_Base Graph GrBase Math Math_Symbol Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Symbol XPosixGraph XPosixPrint XPosixPunct
U+0028 ‹(› \N{ LEFT PARENTHESIS }:
\pP \p{Ps}
All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Common Zyyy Ps P Gr_Base Grapheme_Base Graph GrBase Open_Punctuation Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
U+0029 ‹)› \N{ RIGHT PARENTHESIS }:
\pP \p{Pe}
All Any ASCII Assigned Bidi_M Bidi_Mirrored BidiM Close_Punctuation Pe Common Zyyy P Gr_Base Grapheme_Base Graph GrBase Punct Pat_Syn Pattern_Syntax PatSyn PosixGraph PosixPrint PosixPunct Print Punctuation XPosixGraph XPosixPrint XPosixPunct
Not a Java guy but:
public static final String expression = "[\\s\\p{Punct}]";
The g
at the end of the regexp means it's global; the equivalent Java String
method is replaceAll()
(which takes a search regexp and a replacement string). The only thing you need to do to the regexp itself is escape the \
s, since the Java parser will interpret things like \.
as you trying to escape a .
:
String strippedStr = str.replaceAll("[\\.,-\\/#!$%\\^&\\*;:{}=\\-_`~()]", "");
Wouldn't this character class get the job done in a more concise way?
str.replaceAll("[\\W_]", "");