0

Program reads a list of domains from a text-file and check via Regex if the domain is valid. The problem is that there is some kind of strange whitespace at the end of some domains, which I can't delete. The parsing problems occurs only when there is this strange whitespace. My Regex is totally fine(!!! :-)

I tried trim() and replaceAll("\\s", "") both can't work. How can I figure out what kind of strange Char that could be?

Thanks!

toniedzwiedz
  • 17,895
  • 9
  • 86
  • 131
Crayl
  • 1,883
  • 7
  • 27
  • 43
  • 1
    Do you know what is it? Can you display it's hex value? – AntonH May 31 '14 at 22:22
  • 2
    `replaceAll("\s", "")` wouldn't compile unless you escaped the backslash, but I digress. You'd be best to show us the regex you're using and the byte structure of the string you're having trouble with (`System.out.println(Arrays.toString(troubleString.toCharArray));`). – Makoto May 31 '14 at 22:26
  • 2
    "there is some kind of strange whitespace at the end of some domains" - well, without a way to reproduce, I doubt that anyone will be able to help you... – Nir Alfasi May 31 '14 at 22:39
  • `System.out.println(" a b c d".replaceAll("\\s", ""));` works for me. – Nir Alfasi May 31 '14 at 22:39
  • 1
    @Crayl No problem, glad your problem is solved. – AntonH May 31 '14 at 23:05

1 Answers1

0

Used this method to create a hex representation of that String: https://stackoverflow.com/a/18261315/558559

Turns out that the whitespace is a "no-break space", described here: http://www.fileformat.info/info/unicode/char/a0/index.htm

And to delete it I had to this solution:

stringWithStrangeWhiteSpace = stringWithStrangeWhiteSpace.replace(String.valueOf((char) 160), " ").trim();

Described here: https://stackoverflow.com/a/4728647/558559

I have no idea what I did there :D

Community
  • 1
  • 1
Crayl
  • 1,883
  • 7
  • 27
  • 43