16

I am trying to get the correct encoding with request.

request.get({
    "uri":'http://www.bold.dk/tv/',
    "encoding": "text/html;charset='charset=utf-8'"
  },
  function(err, resp, body){    
    console.log(body);
  }
);

No matter what I do the encoding of the danish chars are not right.

Any thoughts?

Tchoupi
  • 14,560
  • 5
  • 37
  • 71
hippie
  • 649
  • 3
  • 9
  • 18
  • 3
    You've mixed `encoding` with the `content-type` header -- e.g.: `"encoding": "utf-8"`. But, the page is encoded in `ISO-8859-1` rather than `UTF-8`. For that, see http://stackoverflow.com/questions/8915404/http-get-and-iso-8859-1-encoded-responses. – Jonathan Lonowski Aug 20 '12 at 16:48
  • @Amberlamps: Im using notepad++ – hippie Aug 20 '12 at 20:01
  • 1
    @hippie: Now this is a long shot, but sometimes I have the same issue with german letters. And everytime that happens it is because my Notepad++ saves my scripts as ANSI and not UTF-8. If it in ANSI, try switching it to UTF-8. It is an option in your Notepad++ under Coding (dunno what the english term there is, because I am using the german version) – Amberlamps Aug 20 '12 at 20:07
  • Mixed it up, tried alot of things. I tried both and nothing is working. – hippie Aug 20 '12 at 20:18
  • @Amverlambs: Just tried that, didnt work. – hippie Aug 20 '12 at 20:25
  • 1
    @Amberlamps : Your are right about the Notepad++ has issues. Just ran it in cmd. Works fine. Thanks all for helping out – hippie Aug 20 '12 at 20:49
  • if you use `http.request()` and call `request.setHeader('Accept-Charset', 'UTF-8)` that should change the HTTP Header, and a conforming web server should deliver in UTF-8 – Sebastian Dec 19 '12 at 09:29

3 Answers3

31

You can use iconv (lite) to convert this. You also need to tell request not to actively set the encoding to the default of UTF-8 by setting the encoding property to null. Therefore you should do:

var iconv = require('iconv-lite');
request.get({
    uri:'http://www.bold.dk/tv/',
    encoding: null
  },
  function(err, resp, body){    
    var bodyWithCorrectEncoding = iconv.decode(body, 'iso-8859-1');
    console.log(bodyWithCorrectEncoding);
  }
);
Jens Mikkelsen
  • 2,712
  • 19
  • 23
3

Maybe your trouble is in 'Accept-Encoding' header. Let's say you have Headers like 'Accept-Encoding': 'gzip,deflate'

If it's so, you have 2 ways to fixing this:

  1. Remove this Header
  2. Use the following code to unzip the data:

    const req = request(options, res => {
        let buffers = []
        let bufferLength = 0
        let strings = []
    
        const getData = chunk => {
            if (!Buffer.isBuffer(chunk)) {
                strings.push(chunk)
            } else if (chunk.length) {
                bufferLength += chunk.length
                buffers.push(chunk)
            }
        }
    
        const endData = () => {
            let response = {code: 200, body: ''}
            if (bufferLength) {
                response.body = Buffer.concat(buffers, bufferLength)
                if (options.encoding !== null) {
                    response.body = response.body.toString(options.encoding)
                }
                buffers = []
                bufferLength = 0
            } else if (strings.length) {
                if (options.encoding === 'utf8' && strings[0].length > 0 && strings[0][0] === '\uFEFF') {
                    strings[0] = strings[0].substring(1)
                }
                response.body = strings.join('')
            }
            console.log('response', response)
        };
    
        switch (res.headers['content-encoding']) {
            // or, just use zlib.createUnzip() to handle both cases
            case 'gzip':
                res.pipe(zlib.createGunzip())
                    .on('data', getData)
                    .on('end', endData)
                break;
            case 'deflate':
                res.pipe(zlib.createInflate())
                    .on('data', getData)
                    .on('end', endData)
                break;
            default:
                res.pipe(zlib.createInflate())
                    .on('data', getData)
                    .on('end', endData)
                break;
        }
    });
    
woolfi makkinan
  • 189
  • 1
  • 3
0

I have the same problem, with request v2.88.0.

Refer to woolfi makkinan's answer, I got a simple way to solve the problem.

request.get({
    "uri": 'http://www.bold.dk/tv/',
    "encoding": "text/html;charset='charset=utf-8'",
    "gzip": true // notice this config
  },
  function(err, resp, body){    
    console.log(body);
  }
);

Add gzip: true to request options, request will deal with gzip, and then blob can convert to string correctly. ​

ayunami2000
  • 441
  • 6
  • 10
Chengyzh
  • 1
  • 2