0

I have a requirement to clean a string for illegal barcode-39 data and change each illegal charcter to whitespace. Currently the only valid characters in barcode-39 are 0-9,A-Z,-(dash),.(dot),$(dollar-sign),/(forward-slash),+(plus-sign),%(percent-sign) and a space.

I tried the following regular expression but it seems to only use the not operator in the first group of characters.

barcode = barcode.toUpperCase().replaceAll("[^A-Z0-9\\s\\-\\.\\s\\$/\\+\\%]*"," ");

The code seems to only interpret, If not A to Z then replace with space. How do I make it interpret, if not A-Z and not 0-9 and not dash and not dollar-sign and not forward-slash, and so on, then replace char with a space.

Any help would be great.

Nap
  • 8,096
  • 13
  • 74
  • 117

2 Answers2

2

Try changing your pattern string to [^-0-9A-Z.$/+% ]; this will match a single character that is not in the Code 39 specification. Also, if this is code that will get executed many times, avoid using String.replaceAll() since your pattern will get compiled for every method call. Instead, use a pre-compiled pattern as follows:

final static Pattern INVALID_CODE39_CHAR = Pattern.compile("[^-0-9A-Z.$/+% ]");
barcode = INVALID_CODE39_CHAR.matcher(barcode.toUpperCase()).replaceAll(" ");

If you want to replace contiguous invalid characters with a single replacement string, add a + to the end of the pattern. The * in your original pattern will match zero or more of the characters that are not in your character class; in effect, adding your replacement string, (space), after all characters.

Take a look at the Pattern JavaDoc for more information; also, this is very useful.

Go Dan
  • 15,194
  • 6
  • 41
  • 65
1

Why the "*" at the end? I would think that this isn't needed, and what's more will mess things up for you.

Hovercraft Full Of Eels
  • 283,665
  • 25
  • 256
  • 373
  • The * is state for any character that is not in the bracket. Do I need to place caret character on each group of letters? – Nap Sep 16 '11 at 01:52
  • 1
    `*` is not for any character not in the bracket. It's for multiple instances of characters in the bracket and will be "greedy". It adds extra spaces in your result String. – Hovercraft Full Of Eels Sep 16 '11 at 01:55