10

I'm doing the following request (using request/request) against a web service:

return request.postAsync({
    url,
    charset: 'Cp1252', // I also tried utf-8
    encoding: null, //
    // I also tried Cp1252 -> unknown encoding,
    // I also tried utf-8 and nothing at all
    headers: {
         "Accept": "application/octet-stream, text, text/plain, text/xml",
         "Accept-Encoding": "UTF-8",
         'Content-Type': "text/plain; charset=Cp1252;", // also tried utf-8, ISO-8859-1
         "User-Agent": "me"
    }
}).spread((res, body) => {
    body = body.toString();  // I also tried without toString();
    let ws = fs.createWriteStream('hhh.csv');
    ws.write(body);
    ws.end();

Whatever I do, umlauts are turned into .

Those are the headers the web service sends back:

'content-type': 'text; charset=Cp1252',
'content-length': '1895980',
vary: 'Accept-Encoding,User-Agent'

I'm trying this for days with no luck at all. What am I doing wrong?

Here's a list of question/answers that didn't solve my problem so far:

Can it be that one of the following causes my input string to not be UTF-8?

let hash = crypto.createHmac("sha256", this.options.Signer);
this.query['Signature'] = hash.update(stringToSign).digest("base64");

signer is a string containing 0-9, a-z, A-Z, +, and /.

this.query['Signature'] is part of the URL.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
baao
  • 71,625
  • 17
  • 143
  • 203
  • What are the exact characters you are sending? How are they currently encoded and what is the current character set for them? – zerkms Oct 05 '15 at 21:53
  • In the place where the umlaut is "still" umlaut. – zerkms Oct 05 '15 at 21:56
  • "node js request turns umlauts to �" "Whatever I do, umlauts are turned into �." --- what these phrases mean then? – zerkms Oct 05 '15 at 21:58
  • What are the exact characters you are sending? How are they currently encoded and what is the current character set for them? – zerkms Oct 05 '15 at 22:01
  • 1
    Have you tried changing the `defaultEncoding` of your [`writeStream`](https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options)? i.e. `fs.createWriteStream('hhh.csv', {defaultEncoding: 'utf8'});` – Yan Foto Oct 05 '15 at 22:03
  • The url does not open `This webpage is not available DNS_PROBE_FINISHED_NXDOMAIN` – zerkms Oct 05 '15 at 22:06
  • Good. The problem is solved then. – zerkms Oct 05 '15 at 22:13
  • Well, you mentioned that AWS is capable of sending utf-8 encoded data. Which means everything works fine. Otherwise provide more data as I asked 20 minutes ago. – zerkms Oct 05 '15 at 22:15
  • My comment 20 minutes ago had nothing to do with keys. Just read the first comment once again, please. – zerkms Oct 05 '15 at 22:20
  • If it's a proper utf-8 - then it's sent as utf-8 and then there is no problem with it. If it's utf-8 not sure why you put some other encoding but not utf-8. – zerkms Oct 05 '15 at 22:23
  • Have you tried cURLing to the web service without using Node? It may be an issue with the service you are using – Wargog Oct 05 '15 at 23:02
  • Done. PS: your solution is simply a workaround: you haven't fixed a problem, but consequences. Be prepared to be beaten by it once again some time later. – zerkms Oct 05 '15 at 23:44
  • But what could the problem really be? I double checked the files encoding as well as querystrings configuration, it's all UTF8. Nothing else touches the data. I will read up the document you just sent me tomorrow, and have another look at requests source, but I really can't find what I'm missing. @zerkms. Btw. icons takes quite long to convert, so I'll definitely have to have another look. Thanks for the deletion! – baao Oct 05 '15 at 23:49

1 Answers1

6

I finally solved it, using iconv-lite and setting request's encoding¹ to null, making it return the body as a Buffer. Here is my now working configuration:

return request.getAsync({
        url,
        encoding: null,
        headers: {
            "Accept": "text, text/plain, text/xml",
            "Accept-Encoding": "UTF-8",
            'Content-Type': "text/plain; charset=utf-8;",
            "User-Agent": "me"
        }
    }).spread((res, body) => {
        let a = iconv.decode(new Buffer(body), 'cp1252');
        // now a is holding a string with correct Umlauts and ß
baao
  • 71,625
  • 17
  • 143
  • 203