4

I have a document with a field as title having value - Mörder (with an umlaut on o).

When I fetch it in java using the following method the value returned in both the print commands is Morder (with an umlaut on r). Strange.

When I go on to the Web UI provided by Solr the title is Mörder (with an umlaut on o).

Can anyone tell me what is going wrong?

    SolrQuery query = new SolrQuery();
    query.setParam("q", "<some query>");
    query.setStart(start);
    query.setRows(rows);
    query.setFacet(false);
    query.setFields("title");
    QueryResponse rsp = server.query(query);

    SolrDocumentList sdl = rsp.getResults();

    for (SolrDocument sdOl : sdl) {
        System.out.println(sdOl.getFieldValue("title"));
        System.out.println(new String(sdOl.getFieldValue("title").toString().getBytes, "UTF-8"));
    }

EDIT

I am actually comparing document titles from 2 cores. One returns correct umlauts however the other always moves the umlauts to the next character.

JHS
  • 7,761
  • 2
  • 29
  • 53
  • Do you have the same configuration on both the solr containers ? which web server are you using ? are both configured to support UTF-8 ? – Jayendra Feb 12 '13 at 03:52

1 Answers1

1

Unicode decomposition is being messed up by Big/Little indian byte conversion? Just a wild (half-joking) guess.

Realistically, no answer, but I would put Wireshark and see what the client is asking and what the server is answering. That will tell you if the problem is on leaving the server or on arriving to the client.

I don't know your client configuration but if the traffic comes through as binary, there are some client options that will switch it to XML. If that by itself makes the problem go away, then the issue is with javabin format. If it does not, at least you have exact query and response to work from.

Alexandre Rafalovitch
  • 9,709
  • 1
  • 24
  • 27
  • I am actually comparing document titles from 2 cores. One returns correct umlaut however the other does not. – JHS Feb 11 '13 at 23:08