2

I'm getting a date from a web (html): " abril   2013  Viernes 19"

I've tried all normal regex with no success.

Finally I discovered the string bytes (str.getBytes()), and this are the values:

[-96, 97, 98, 114, 105, 108, -96, -96, -96, 50, 48, 49, 51, -96, -96, 86, 105, 101, 114, 110, 101, 115, -96, 49, 57]

What are this -96?

how to replace 1 or more -96 or whatever empty space is by 1 space?

surfealokesea
  • 4,971
  • 4
  • 28
  • 38

4 Answers4

4

The byte -96 (A0 in hexadecimal, or 160 as an unsigned byte), is the non-breaking space in the ISO-8859-1 character encoding, which is probably the encoding you used to transform the string to bytes.

JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • @rgettman I think there's a serial downvoter in this thread. A bunch of my posts were downvoted, but then the system picked it up and reverted most of them. – Vivin Paliath Apr 19 '13 at 16:56
4

The first byte (-96) is negative because in Java bytes are signed. It corresponds to character 160 (256 - 96), which is a non-breaking space. You'll need to specify that character directly in your regular expression.

str = str.replaceAll(String.valueOf((char) -96), " ");
rgettman
  • 176,041
  • 30
  • 275
  • 357
1

You should be able to use the Character.isSpaceChar function to do this. As mentioned in a response to a related question, you can use it in a java regex like this:

String sampleString = "\u00A0abril\u00A0\u00A02013\u00A0Viernes\u00A019";
String result = sampleString.replaceAll("\\p{javaSpaceChar}", " ");

I think that will do exactly what you want while avoiding any need to deal with raw bytes.

Community
  • 1
  • 1
DaoWen
  • 32,589
  • 6
  • 74
  • 101
0

I fixed this way (please if anyone have a better answer I'll appreciate it):

byte[] b=str.getBytes();
for (int i = 0; i < b.length; i++) {
    if (b[i]==-96)
        b[i]=" ".getBytes()[0];
}
String strOut=new String(b).trim();
Pattern blank=Pattern.compile("\\s+|\b+|\t+|\n+|\f+|\r+");
strOut=blank.matcher(strOut).replaceAll(" ");

Thanks every body for help!

surfealokesea
  • 4,971
  • 4
  • 28
  • 38