0

In a Firefox addon I am caching lengthy strings to disk. I would like to be able to give users some idea of how much disk space in bytes these strings are taking up.

I understand that Javascript stores strings as UTF-16. If a UTF-8 string is saved in a variable, it is converted to UTF-16. So UTF-8 methods of determining string size will not do here.

From this reference:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#Description

It states that the value of string.length is actually the number of UTF-16 code units, and not the number of characters.

From this I infer that the disk space in bytes would simply be string.length * 2. I am looking for confirmation as to whether my assumption is correct.

EDIT:

(Several edits made to the title and original text. Also, the following:)

It was suggested that this is a duplicate of How many bytes in a JavaScript string?. However this does not address my question, as it refers to methods of getting string size of UTF-8 strings, however Javascript converts UTF-8 strings to UTF-16 when it stores them. For example a UTF-8 character that takes up 3 bytes may only use 2 bytes (1 UTF-16 code unit) when converted to UTF-16.

KevinHJ
  • 1,014
  • 11
  • 24
  • You missed the part of the sentence "...uses a single 16-bit code unit to represent the most common characters...", so no, your assumption is not correct. – Heretic Monkey Sep 15 '20 at 14:54
  • Does this answer your question? [How many bytes in a JavaScript string?](https://stackoverflow.com/questions/2219526/how-many-bytes-in-a-javascript-string) – Heretic Monkey Sep 15 '20 at 14:57
  • @HereticMonkey and the other part ...but needs to use two code units for less commonly-used characters, so it's possible for the value returned by length to not match the actual number of characters in the string. The point is, string.length is counting code units, NOT characters. So please tell me exactly how the statement you included invalidates my assumption. – KevinHJ Sep 15 '20 at 16:06
  • @HereticMonkey well, not really. It talks mainly about the number of UTF-8 bytes in a string, while Javascript stores strings as UTF-16 – KevinHJ Sep 15 '20 at 16:10
  • There are 13 answers on the duplicate... [This answer uses a `Blob` to get the size](https://stackoverflow.com/a/52254083/215552). – Heretic Monkey Sep 15 '20 at 16:12
  • @HereticMonkey The character コ is \xe3\x82\xb3 in UTF-8 (3 bytes) and \x30\xb3 in UTF-16 (2 bytes). So I don't see how getting the total bytes using UTF-8 decoding tools will give you the total bytes stored as a UTF-16 string. – KevinHJ Sep 15 '20 at 16:21
  • This sounds like a problem you're having with answer(s) on the other question. You asked how to get the size of a string in bytes. That question has many answers on how to do so. If you don't think it answers your question, [edit] your question to include why not, as detailed in [“This question already has answers here” - but it does not. What can I do when I think my question's not a duplicate?](https://meta.stackoverflow.com/q/252252/215552) – Heretic Monkey Sep 15 '20 at 16:41
  • @HereticMonkey "You asked how to get the size of a string in bytes" No, I asked "How to determine the actual memory or disk space used by a string in Javascript", gave a link describing Javascript strings as UTF-16 (NOT UTF-8, like all the answers in that article are referring to.), and repeatedly referred to "disk space" and not "size" in my question. I don't see there is any reason for me to edit my question, unless it is to make pre-emptive comments for those who will make incorrect assumptions about what I am asking. Are you not aware that Javascript converts UTF-8 strings into UTF-16? – KevinHJ Sep 15 '20 at 17:58
  • From the question: "how much disk space in bytes". I'm not sure how translating that to "how many bytes a string takes" is unwarranted. Of course, if you *really* want to get into *disk* space used, then you'll have to specify the file system and the allocation unit size. Are you not aware that affects how much space a particular file uses on disk? – Heretic Monkey Sep 15 '20 at 18:19
  • Or, you could just edit the question and add a few words about how that other question doesn't answer yours. Up to you. I'm out. – Heretic Monkey Sep 15 '20 at 18:20
  • @HereticMonkey, thanks, I took your suggestion and (heavily) edited my question. – KevinHJ Sep 15 '20 at 18:51

0 Answers0