JavaScript truncate text by bytes length

Question

I want to truncate a piece of utf8 encoded text to a given length in bytes. For example, if the text is

Hello , I like rice cakes ¯\_(ツ)_/¯

I would like to truncate that text to 10 bytes max.

I found the truncate-utf8-bytes NPM module that does exactly what I need, unfortunately, the project I am working on doesn't use webpack or browerify so I cannot use those NPM modules as far as I'm aware

So I was wondering if there was a reliable way to truncate the text, or if there was a way for me to use the truncate-utf8-bytes module in the browser.

Thanks

Have you checked https://stackoverflow.com/questions/1515884/using-javascript-to-truncate-text-to-a-certain-size-8-kb? — Shinjo, Sep 03 '19 at 10:09
@Shinjo Yes I have, but I read the solutions there are deprecated. Also I need solutions that take into account of multi-byte characters and surrogate pairs. — Uchenna Okafor, Sep 03 '19 at 10:10
Really? It's working good FWIW: https://jsfiddle.net/0xmcauqw/ Maybe you can add your desired output and "Also I need solutions that take into account of multi-byte characters and surrogate pairs." what is your input and expected output? Also what is your current progress, [mcve] — Shinjo, Sep 03 '19 at 10:14
Did you not read the package's documentation? "*[A browser implementation](https://github.com/parshap/truncate-utf8-bytes/blob/master/browser.js) that doesn't use Buffer.byteLength is provided*" (using [this](https://github.com/parshap/utf8-byte-length/blob/master/browser.js) and [that](https://github.com/parshap/truncate-utf8-bytes/blob/master/lib/truncate.js)). If your project doesn't use a bundler, that means you have to bundle manually, but the code is still there. — Bergi, Sep 03 '19 at 10:15
how about something like [`this`](https://jsbin.com/dabumix/edit?js,console) — Code Maniac, Sep 03 '19 at 10:17
@Bergi I did, my interpretation of that was it doesn't use the Buffer library which is a node.js module, hence why it says browser, because browser don't have those modules. — Uchenna Okafor, Sep 03 '19 at 10:19
Anyways, I have found an answer. Thanks @CodeManiac and Shinjo — Uchenna Okafor, Sep 03 '19 at 10:23

score 2 · Accepted Answer · answered Sep 03 '19 at 10:28

2

Something like this should work, assuming you know the encoding of the text:

let str = 'Hello , I like rice cakes ¯\_(ツ)_/¯';
let enc = new TextEncoder();
let dec = new TextDecoder('utf-8');
let uint8 = enc.encode(str)
let section = uint8.slice(0,11)
let result = dec.decode(section);
console.log('result', result);

answered Sep 03 '19 at 10:28

Gavin

2,214
2
18
26

I think this is exactly what I was looking for. I am already using TextDecoder to check the length in bytes, I just didn't know there was a TextEncoder. Thank you so much. Quick question, the uint8.slice(0, 11), does each array item represent one byte? – Uchenna Okafor Sep 03 '19 at 10:32
Yup, in that example uint8 is an array of 8-bit bytes (which is what you'd want for utf8). Note that slice will split through any characters that use multiple bytes. As I understand the TextDecoder makes it safe for rendering though but you might want to check that for your purposes. – Gavin Sep 03 '19 at 10:37
Also note: TextEncoder only supports utf8 now (https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder) but there is a polyfill for other encodings. – Gavin Sep 03 '19 at 10:39
This solution will not correctly truncate multi-byte characters - for example if you do `slice(0,8)` instead, you will get a corrupted character: `'Hello �'` – Mikael Finstad Jul 30 '23 at 12:11

score 2 · Answer 2 · edited Nov 05 '21 at 08:51

2

Answer 1 works great but you might consider adding this to the end to avoid ending up with invalid characters that were truncated mid character:

result.replace(/\uFFFD/g, '')

edited Nov 05 '21 at 08:51

drnugent

1,545
9
22

answered Nov 04 '21 at 17:13

goebel02

21
2

JavaScript truncate text by bytes length

2 Answers2