0

I am getting a string from our database (third party tool) - and I have a trouble with one name - sometimes it is right "Tarsøy", and all runs smoothly but sometimes it is "Tars00F8y". And this ruins the process - I have tried to write some validator function via URLDecoder.decode(name, "UTF-8") that gets a string and return validated one but not succeed.

this is how I get a sting from our base:

Database.WIKI.get(index); // the index is the ID of the string 
                            // this is no sql DB 

now about "sometimes" - it means that this code just works different =) I think that is connected with inner DB exceptions or so. So I am trying to do something like validate(Database.WIKI.get(index)) May be I should try something like Encode String to UTF-8

Community
  • 1
  • 1
curiousity
  • 4,703
  • 8
  • 39
  • 59
  • 2
    Can you show us how you read the `String` from the database and how you use it? – icza Sep 02 '14 at 08:35
  • 1
    If it works sometimes it sounds like the problem is bad data in the database. – Keppil Sep 02 '14 at 08:37
  • You should provide a better definition of “sometimes”. Btw., it’s not surprising that your attempt with `URLDecoder.decode(name, "UTF-8")` failed. The string `"Tars00F8y"`is neither a URL nor UTF-8 encoded. – Holger Sep 02 '14 at 08:40

1 Answers1

2

In Java, JavaScript and (especially interesting) JSON there exists the notation \u00F8 for ø. I think this was sent to the database, maybe from a specific browser on a specific computer locale. \u disappeared and voilà. Maybe it is still as invisible control character in the string. That would be nice for repairs.

My guess is JSON data; however normally JSON libraries should parse u-escaped characters. That is weird.

Check what happens when storing "x\\u00FDx". Is the char length 6 or maybe 7 (lucky).

Some sanity checks: assuming you work in UTF-8, especially if the data arrive by HTML or JS:

  • Content-Type header text/html; charset=UTF-8
  • (Optional) meta tag with charset=UTF-8
  • <form action="..." accept-charset="UTF-8">
  • JSON: contentType: "application/json; charset=UTF-8"
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138