77

I have Javascript in an XHTML web page that is passing UTF-8 encoded strings. It needs to continue to pass the UTF-8 version, as well as decode it. How is it possible to decode a UTF-8 string for display?

<script type="text/javascript">
// <![CDATA[
function updateUser(usernameSent){
    var usernameReceived = usernameSent; // Current value: Größe
    var usernameDecoded = usernameReceived;  // Decode to: Größe
    var html2id = '';
    html2id += 'Encoded: ' + usernameReceived + '<br />Decoded: ' + usernameDecoded;
    document.getElementById('userId').innerHTML = html2id;
}
// ]]>
</script>
Jon Adams
  • 24,464
  • 18
  • 82
  • 120
Jarrett Mattson
  • 1,005
  • 2
  • 9
  • 14
  • So what's your problem? give an example. – xiaoyi Nov 13 '12 at 06:43
  • I need to decode the UTF-8; Größe should be decoded from Größe – Jarrett Mattson Nov 13 '12 at 06:50
  • What's the `Größe`? It's not URL encoded. – xiaoyi Nov 13 '12 at 06:51
  • 6
    This is not a problem you use JavaScript to solve. The way to solve it would be to add an appropriate `meta` tag like `` and XML declaration like ``. – icktoofay Nov 13 '12 at 06:53
  • And put the meta as the first tag in `` section. Send BOM to client also do the job. – xiaoyi Nov 13 '12 at 06:55
  • I also need to keep it encoded in the same script. – Jarrett Mattson Nov 13 '12 at 06:59
  • 5
    *What?* As long as your webpage is encoded in UTF-8, js will treat strings as UTF-8 encoded, and `encodeURIComponent()` and `decodeURIComponent()` will assume the data is UTF-8 encoding. – xiaoyi Nov 13 '12 at 07:07
  • where and why the extra xml declaration? – Jarrett Mattson Nov 13 '12 at 07:18
  • 1
    "Größe" is not UTF-8 (well, it may be, but not intrinsically), it's a ***mess***. It's already broken. Several times, apparently. It doesn't need to be "decoded", wherever it's failing and becomes broken needs to be *fixed*. Give more context information, otherwise it's hard to help. – deceze Nov 13 '12 at 07:18
  • That's how PHP encoded it apparently, seems to decode it just fine. It knows what to do with it after this, just can't display the text right. – Jarrett Mattson Nov 13 '12 at 07:21
  • Looks like `GröÃe` on the web page not decoded. – Jarrett Mattson Nov 13 '12 at 07:23
  • 1
    [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/) and [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) – deceze Nov 13 '12 at 07:25
  • Wher is your data come from? and how you delivered it to client? The encoding setting is needed for every step. http://allseeing-i.com/How-to-setup-your-PHP-site-to-use-UTF8 – xiaoyi Nov 13 '12 at 07:37
  • The data is a filename UTF8_encode by PHP. It's decoded into this page where it is eventually passed back to PHP on the same page (working). – Jarrett Mattson Nov 13 '12 at 07:42
  • If the data is encoded in UTF8, there is no need to decode before send to client. can you post the whole thing in your question? – xiaoyi Nov 13 '12 at 07:44
  • 4
    Don't randomly apply `utf8_encode`. Do you need it? Do you know why you need it? – deceze Nov 13 '12 at 07:50
  • If the user tries to use it then yes. It's not randomly applied, but done so file names won't break. – Jarrett Mattson Nov 13 '12 at 07:56
  • 3
    The "it" in "user tries to use it" refers to UTF-8? Then you don't need `utf8_encode`. Not necessarily. `utf8_encode` *transforms* the encoding of a string from ISO 8859-1 to UTF-8. It tries to do that even if the string is already UTF-8. UTF-8 "Größe" → `utf8_encode` → "GröÃe" → `utf8_encode` "GröÃÂe". If you apply it when you don't need it, your string screws up. – deceze Nov 13 '12 at 09:13
  • Ahh, I must be double encoding and decoding with PHP/XHTML for the filename. Is there a better way to make a filename, like MD5? What I'm trying to do still is Decode UTF-8 with Javscript! – Jarrett Mattson Nov 13 '12 at 20:53
  • 1
    I'm voting to close this question because it's completely misleading and it's only attracting equally misleading answers that only spread confusion. – Álvaro González Nov 17 '16 at 10:00
  • I agree with previous people about how misleading this thread is, but what most people are actually looking for is a pure javascript encoding/decoding library that will solve their encoding issues, so this is what I found when I googled on more than just **UTF8 encoding/decoding** : https://github.com/inexorabletash/text-encoding , **This is a paste from their README** : All encodings from the Encoding specification are supported, Enjoy ! – Olle Tiinus Feb 26 '19 at 10:21

15 Answers15

179

To answer the original question: here is how you decode utf-8 in javascript:

http://ecmanaut.blogspot.ca/2006/07/encoding-decoding-utf8-in-javascript.html

Specifically,

function encode_utf8(s) {
  return unescape(encodeURIComponent(s));
}

function decode_utf8(s) {
  return decodeURIComponent(escape(s));
}

We have been using this in our production code for 6 years, and it has worked flawlessly.

Note, however, that escape() and unescape() are deprecated. See this.

Anna
  • 319
  • 5
  • 18
CpnCrunch
  • 4,831
  • 1
  • 33
  • 31
  • 1
    I've tried using the `decodeURIComponent(escape(usernameReceived))` and `decodeURIComponent(usernameReceived)`, but neither are transforming `usernameReceived`. Can you show some functional code? – Jarrett Mattson Jan 29 '14 at 20:25
  • Here is my code: s = decodeURIComponent( escape( s )); Note that you have to put it in a try/catch block. – CpnCrunch Jan 30 '14 at 21:23
  • Please considering marking the answer as accepted if it answers the question, or let me know if you still have problems with it. – CpnCrunch Nov 11 '15 at 18:59
  • 6
    This works for me. But the as you know escape method id deprecated. We are using TypeScript and its not there by default. So what is the best alternative for escape. encodeURI & encodeURIComponent doesn't work to replace escape her in this scenario as they produce different output. – Joy George Kunjikkuru Dec 10 '15 at 15:56
  • Joymon: you would need to replace both the escape() and unescape(). I haven't tried it myself though. – CpnCrunch Dec 10 '15 at 22:27
  • not work of plugin **jquery UI tabs** ... necessary `` to work fine. thanks! – KingRider Sep 18 '17 at 13:34
  • 1
    I've been asked to add a comment because I downvoted this. But all I can say is that since escape is deprecated, this answer is not acceptable. Why is escape deprecated if it performs an important function? And why is there no native UTF-8 support in JavaScript? And why does no one care (last comment was two years ago). – David Spector Sep 05 '19 at 15:57
  • David: yes, you're correct. I've updated the answer to note that escape and unescape are deprecated. Replacing with encodeURIComponent and decodeURIComponent doesn't appear to work (unsurprising, as you would then be calling the same encode/decode function, resulting in no effect). I think it is unlikely that these will be removed soon, but if they are removed (or if you want to be safe), the best option would be to use lauthu or fakedrake's solution. – CpnCrunch Sep 06 '19 at 16:56
  • 5
    When a deprecated functionality is actually useful, the best way to prevent it from being removed is to keep using it instead of refraining from using it. Browser vendors use usage statistics to determine when to remove a feature. – GetFree Oct 04 '19 at 14:39
  • See also https://stackoverflow.com/a/37303214 for reference (implementation of `escape()`) – Andreas Feb 07 '22 at 11:39
  • This solution totally worked for me in order to accept Chinese text search in a NodeJS-based API implementation.. – Salocin.TEN Apr 12 '22 at 06:11
  • How is this deprecated function, apparently deprecated for over 7 years now, still the only simple way to parse a UTF-8 string such as this : `"{\"campaign_memberships\":{\"JÅ\u008dzai Corp\":\"admin\"}}"`? – Philipp Doerner Apr 29 '22 at 14:30
  • dont work with "Luças João" – Lucas Resende Jan 05 '23 at 13:18
  • Works perfectly with that string on chrome. When I encode and then decode it, it results in the original string again. – CpnCrunch Jan 05 '23 at 17:08
35

This should work:

// http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt

/* utf.js - UTF-8 <=> UTF-16 convertion
 *
 * Copyright (C) 1999 Masanao Izumo <iz@onicos.co.jp>
 * Version: 1.0
 * LastModified: Dec 25 1999
 * This library is free.  You can redistribute it and/or modify it.
 */

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
    }
    }

    return out;
}

Check out the JSFiddle demo.

Also see the related questions: here and here

Community
  • 1
  • 1
Albert
  • 65,406
  • 61
  • 242
  • 386
  • 12
    Upvote for actually understanding what decoding UTF-8 is. –  Sep 10 '15 at 14:05
  • 7
    This code is incorrect. `fromCharCode` accepts UTF-16 values so you need to convert to UTF-16 before invoking it. – Simon Nov 01 '17 at 18:53
  • Some archaeology on the source: https://web-archive-org.translate.goog/web/20121116231954/http://www.onicos.com/staff/iz/amuse/javascript/expert/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp – Fuhrmanator May 16 '22 at 19:57
29

Perhaps using the textDecoder will be sufficient.

Not supported in IE though.

var decoder = new TextDecoder('utf-8'),
    decodedMessage;

decodedMessage = decoder.decode(message.data);

Handling non-UTF8 text

In this example, we decode the Russian text "Привет, мир!", which means "Hello, world." In our TextDecoder() constructor, we specify the Windows-1251 character encoding, which is appropriate for Cyrillic script.

    let win1251decoder = new TextDecoder('windows-1251');
    let bytes = new Uint8Array([207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33]);
    console.log(win1251decoder.decode(bytes)); // Привет, мир!

The interface for the TextDecoder is described here.

Retrieving a byte array from a string is equally simpel:

const decoder = new TextDecoder();
const encoder = new TextEncoder();

const byteArray = encoder.encode('Größe');
// converted it to a byte array

// now we can decode it back to a string if desired
console.log(decoder.decode(byteArray));

If you have it in a different encoding then you must compensate for that upon encoding. The parameter in the constructor for the TextEncoder is any one of the valid encodings listed here.

Jonathan
  • 1,355
  • 14
  • 22
  • 3
    @ÁlvaroGonzález But it works and might be standard (future browsers will need to suport this too, okay?) –  Feb 02 '17 at 13:24
  • 5
    Nowadays this is not experimental, has great support in all modern browsers, and is absolutely the right choice for everybody (unless you still have to support IE) – Tim Perry Jun 17 '20 at 10:43
  • Where do i get the message.data from? – Jamie Hutber Jan 15 '21 at 21:10
  • @JamieHutber Perhaps you are looking for this?: https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder – Jonathan Jan 18 '21 at 09:27
  • this does not work for strings, only array buffers. – Juan Vilar Feb 15 '21 at 10:19
  • @JuanVilar So you already have a string, that is encoded in a specific way, that you can then use in the textEncoder to convert it into a array buffer that you can then use to convert it into a string with the desired encoding. – Jonathan Feb 16 '21 at 11:07
  • Decode with TextDecoder("utf-16") to half the string. In 16 bit javascript memory the upper byte is then used - so it halves the memory usage. But files is best with the default "utf-8" - it got bigger with "utf-16" when I compared the downloads. – Dan Froberg Jul 21 '23 at 01:14
11

Update @Albert's answer adding condition for emoji.

function Utf8ArrayToStr(array) {
    var out, i, len, c;
    var char2, char3, char4;

    out = "";
    len = array.length;
    i = 0;
    while(i < len) {
    c = array[i++];
    switch(c >> 4)
    { 
      case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
        // 0xxxxxxx
        out += String.fromCharCode(c);
        break;
      case 12: case 13:
        // 110x xxxx   10xx xxxx
        char2 = array[i++];
        out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
        break;
      case 14:
        // 1110 xxxx  10xx xxxx  10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        out += String.fromCharCode(((c & 0x0F) << 12) |
                       ((char2 & 0x3F) << 6) |
                       ((char3 & 0x3F) << 0));
        break;
     case 15:
        // 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx
        char2 = array[i++];
        char3 = array[i++];
        char4 = array[i++];
        out += String.fromCodePoint(((c & 0x07) << 18) | ((char2 & 0x3F) << 12) | ((char3 & 0x3F) << 6) | (char4 & 0x3F));

        break;
    }

    return out;
}
Community
  • 1
  • 1
lauthu
  • 306
  • 3
  • 11
  • 1
    Note: This works on a well formed UTF-8 input, but breaks without notice on some conditions: For example it assumes that there are correct number of bytes left, and that they are of correct continue sequence `0b10xxxxxx`, and in `case 15` it should only match `0b11110xxx` or it can decode an illegal code point. – some Feb 05 '20 at 15:35
10

Here is a solution handling all Unicode code points include upper (4 byte) values and supported by all modern browsers (IE and others > 5.5). It uses decodeURIComponent(), but NOT the deprecated escape/unescape functions:

function utf8_to_str(a) {
    for(var i=0, s=''; i<a.length; i++) {
        var h = a[i].toString(16)
        if(h.length < 2) h = '0' + h
        s += '%' + h
    }
    return decodeURIComponent(s)
}

Tested and available on GitHub

To create UTF-8 from a string:

function utf8_from_str(s) {
    for(var i=0, enc = encodeURIComponent(s), a = []; i < enc.length;) {
        if(enc[i] === '%') {
            a.push(parseInt(enc.substr(i+1, 2), 16))
            i += 3
        } else {
            a.push(enc.charCodeAt(i++))
        }
    }
    return a
}

Tested and available on GitHub

Matthew Voss
  • 111
  • 1
  • 5
9

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

https://github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

Olle Tiinus
  • 214
  • 2
  • 9
  • This is best way work for me.thanks, for more info click https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder/TextDecoder – henrry Jun 16 '20 at 11:39
6

@albert's solution was the closest I think but it can only parse up to 3 byte utf-8 characters

function utf8ArrayToStr(array) {
  var out, i, len, c;
  var char2, char3;

  out = "";
  len = array.length;
  i = 0;

  // XXX: Invalid bytes are ignored
  while(i < len) {
    c = array[i++];
    if (c >> 7 == 0) {
      // 0xxx xxxx
      out += String.fromCharCode(c);
      continue;
    }

    // Invalid starting byte
    if (c >> 6 == 0x02) {
      continue;
    }

    // #### MULTIBYTE ####
    // How many bytes left for thus character?
    var extraLength = null;
    if (c >> 5 == 0x06) {
      extraLength = 1;
    } else if (c >> 4 == 0x0e) {
      extraLength = 2;
    } else if (c >> 3 == 0x1e) {
      extraLength = 3;
    } else if (c >> 2 == 0x3e) {
      extraLength = 4;
    } else if (c >> 1 == 0x7e) {
      extraLength = 5;
    } else {
      continue;
    }

    // Do we have enough bytes in our data?
    if (i+extraLength > len) {
      var leftovers = array.slice(i-1);

      // If there is an invalid byte in the leftovers we might want to
      // continue from there.
      for (; i < len; i++) if (array[i] >> 6 != 0x02) break;
      if (i != len) continue;

      // All leftover bytes are valid.
      return {result: out, leftovers: leftovers};
    }
    // Remove the UTF-8 prefix from the char (res)
    var mask = (1 << (8 - extraLength - 1)) - 1,
        res = c & mask, nextChar, count;

    for (count = 0; count < extraLength; count++) {
      nextChar = array[i++];

      // Is the char valid multibyte part?
      if (nextChar >> 6 != 0x02) {break;};
      res = (res << 6) | (nextChar & 0x3f);
    }

    if (count != extraLength) {
      i--;
      continue;
    }

    if (res <= 0xffff) {
      out += String.fromCharCode(res);
      continue;
    }

    res -= 0x10000;
    var high = ((res >> 10) & 0x3ff) + 0xd800,
        low = (res & 0x3ff) + 0xdc00;
    out += String.fromCharCode(high, low);
  }

  return {result: out, leftovers: []};
}

This returns {result: "parsed string", leftovers: [list of invalid bytes at the end]} in case you are parsing the string in chunks.

EDIT: fixed the issue that @unhammer found.

fakedrake
  • 6,528
  • 8
  • 41
  • 64
  • 1
    When I try this with [195,165] I get `{"result":"","leftovers":[195, 165]}` while @Albert's gives "å" – unhammer Nov 14 '16 at 11:48
  • You are right, I fixed it in my project but not in this post. Sorry about my neglect. – fakedrake Nov 14 '16 at 17:49
  • No problem, seems to work now :-) Kinda funny that it already got two upvotes before anyone tested it though :-) Now `utf8ArrayToStr([240,159,154,133])` gives me my "" – unhammer Nov 15 '16 at 08:36
6

// String to Utf8 ByteBuffer

function strToUTF8(str){
  return Uint8Array.from(encodeURIComponent(str).replace(/%(..)/g,(m,v)=>{return String.fromCodePoint(parseInt(v,16))}), c=>c.codePointAt(0))
}

// Utf8 ByteArray to string

function UTF8toStr(ba){
  return decodeURIComponent(ba.reduce((p,c)=>{return p+'%'+c.toString(16),''}))
}
user9642681
  • 61
  • 1
  • 1
1

Using my 1.6KB library, you can do

ToString(FromUTF8(Array.from(usernameReceived)))
MCCCS
  • 1,002
  • 3
  • 20
  • 44
1

This is a solution with extensive error reporting.

It would take an UTF-8 encoded byte array (where byte array is represented as array of numbers and each number is an integer between 0 and 255 inclusive) and will produce a JavaScript string of Unicode characters.

function getNextByte(value, startByteIndex, startBitsStr, 
                     additional, index) 
{
    if (index >= value.length) {
        var startByte = value[startByteIndex];
        throw new Error("Invalid UTF-8 sequence. Byte " + startByteIndex 
            + " with value " + startByte + " (" + String.fromCharCode(startByte) 
            + "; binary: " + toBinary(startByte)
            + ") starts with " + startBitsStr + " in binary and thus requires " 
            + additional + " bytes after it, but we only have " 
            + (value.length - startByteIndex) + ".");
    }
    var byteValue = value[index];
    checkNextByteFormat(value, startByteIndex, startBitsStr, additional, index);
    return byteValue;
}

function checkNextByteFormat(value, startByteIndex, startBitsStr, 
                             additional, index) 
{
    if ((value[index] & 0xC0) != 0x80) {
        var startByte = value[startByteIndex];
        var wrongByte = value[index];
        throw new Error("Invalid UTF-8 byte sequence. Byte " + startByteIndex 
             + " with value " + startByte + " (" +String.fromCharCode(startByte) 
             + "; binary: " + toBinary(startByte) + ") starts with " 
             + startBitsStr + " in binary and thus requires " + additional 
             + " additional bytes, each of which shouls start with 10 in binary."
             + " However byte " + (index - startByteIndex) 
             + " after it with value " + wrongByte + " (" 
             + String.fromCharCode(wrongByte) + "; binary: " + toBinary(wrongByte)
             +") does not start with 10 in binary.");
    }
}

function fromUtf8 (str) {
        var value = [];
        var destIndex = 0;
        for (var index = 0; index < str.length; index++) {
            var code = str.charCodeAt(index);
            if (code <= 0x7F) {
                value[destIndex++] = code;
            } else if (code <= 0x7FF) {
                value[destIndex++] = ((code >> 6 ) & 0x1F) | 0xC0;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0xFFFF) {
                value[destIndex++] = ((code >> 12) & 0x0F) | 0xE0;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x1FFFFF) {
                value[destIndex++] = ((code >> 18) & 0x07) | 0xF0;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x03FFFFFF) {
                value[destIndex++] = ((code >> 24) & 0x03) | 0xF0;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else if (code <= 0x7FFFFFFF) {
                value[destIndex++] = ((code >> 30) & 0x01) | 0xFC;
                value[destIndex++] = ((code >> 24) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 18) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 12) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 6 ) & 0x3F) | 0x80;
                value[destIndex++] = ((code >> 0 ) & 0x3F) | 0x80;
            } else {
                throw new Error("Unsupported Unicode character \"" 
                    + str.charAt(index) + "\" with code " + code + " (binary: " 
                    + toBinary(code) + ") at index " + index
                    + ". Cannot represent it as UTF-8 byte sequence.");
            }
        }
        return value;
    }
1

You should take decodeURI for it.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURI

As simple as this:

decodeURI('https://developer.mozilla.org/ru/docs/JavaScript_%D1%88%D0%B5%D0%BB%D0%BB%D1%8B');
// "https://developer.mozilla.org/ru/docs/JavaScript_шеллы"

Consider to use it inside try catch block for not missing an URIError.

Also it has full browsers support.

Vadim Shvetsov
  • 2,126
  • 2
  • 20
  • 28
1
const decoder = new TextDecoder();
console.log(decoder.decode(new Uint8Array([97])));

enter image description here

MDN resource link

Royer Adames
  • 868
  • 9
  • 13
0

I reckon the easiest way would be to use a built-in js functions decodeURI() / encodeURI().

function (usernameSent) {
  var usernameEncoded = usernameSent; // Current value: utf8
  var usernameDecoded = decodeURI(usernameReceived);  // Decoded
  // do stuff
}
Kasparow
  • 17
  • 2
-3

I searched for a simple solution and this works well for me:

//input data
view = new Uint8Array(data);

//output string
serialString = ua2text(view);

//convert UTF8 to string
function ua2text(ua) {
    s = "";
    for (var i = 0; i < ua.length; i++) {
        s += String.fromCharCode(ua[i]);
    }
    return s;               
}

Only issue I have is sometimes I get one character at a time. This might be by design with my source of the arraybuffer. I'm using https://github.com/xseignard/cordovarduino to read serial data on an android device.

Adween
  • 2,792
  • 2
  • 18
  • 20
Evan Grant
  • 13
  • 1
  • 4
  • 1
    This does not actually decode UTF-8. For example, `C3 BC` should be decoded as `ü`, but your answer returns `ü`. – phihag Jan 02 '16 at 14:19
-3

Preferably, as others have suggested, use the Encoding API. But if you need to support IE (for some strange reason) MDN recommends this repo FastestSmallestTextEncoderDecoder

If you need to make use of the polyfill library:

    import {encode, decode} from "fastestsmallesttextencoderdecoder";

Then (regardless of the polyfill) for encoding and decoding:

    // takes in USVString and returns a Uint8Array object
    const encoded = new TextEncoder().encode('€')
    console.log(encoded);

    // takes in an ArrayBuffer or an ArrayBufferView and returns a DOMString
    const decoded = new TextDecoder().decode(encoded);
    console.log(decoded);
geremews
  • 11
  • 4
  • A link to a solution is welcome, but please ensure your answer is useful without it: [add context around the link](//meta.stackexchange.com/a/8259) so your fellow users will have some idea what it is and why it is there, then quote the most relevant part of the page you are linking to in case the target page is unavailable. [Answers that are little more than a link may be deleted.](/help/deleted-answers) – 10 Rep May 06 '21 at 00:03