10

I need to try to estimate the DISK size of a text string (which could be raw text or a Base64 encoded string for an image/audio/etc) in JavaScript. I'm not sure how to estimate this. The only thing when Googling i can find is .length so i thought maybe someone on StackOverflow might know...

The reason i need to know is i have a localStorage script that needs (or would love to have) the ability to check when a user is nearing his 5MB (or 10MB in IE) quota and prompt them to increase the max size for the domain. So, if a user hits, lets say, 4.5MBs of data it'd prompt with

You're nearing your browsers 5MB data cap. Please increase your max data by... [instructions on increasing it for the browser]

Oscar Godson
  • 31,662
  • 41
  • 121
  • 201
  • 1
    I am not sure what your question is. Can you rephrase? And: Why don't you try what you found? – jwueller Nov 29 '10 at 22:25
  • @elusive; Because the length of a string is different than the disk space it occupies. – Chris Laplante Nov 29 '10 at 22:26
  • Oops! i'm sorry, forgot to add the keyword there... i added it now in caps and italics so no one misses it. I totally forgot to add that :( – Oscar Godson Nov 29 '10 at 22:27
  • @SimpleCoder, a string doesn't occupy any disk space. It occupies memory so I don't quite get your comment. @Oscar Godson, same remark to you so that you clarify your question as right now it doesn't make any sense and I am tempted to vote to close as not a real question. – Darin Dimitrov Nov 29 '10 at 22:27
  • 1
    @SimpleCoder: Disc space? Are you talking about RAM? Then no, there is no way of determining that. At least not implementation-independent, since this is not part of the spec. – jwueller Nov 29 '10 at 22:28
  • No, disk space a given string would occupy. Let me add more details to my post... 1 second. – Oscar Godson Nov 29 '10 at 22:29
  • @Darin Dimitrov & @elusive: It does when it is on a disk, which is what the asker is asking us to help him estimate. – Chris Laplante Nov 29 '10 at 22:30
  • @SimpleCoder, yes but a string is **NEVER** stored on disk. Only raw bytes are stored on disk. A string has absolutely no meaning to a disk. I would recommend you reading about how file systems work. Reading about encodings will be a plus. – Darin Dimitrov Nov 29 '10 at 22:31
  • 1
    @Darin Dimitrov; Don't patronize me. You are missing the point. The asker wants to estimate the disk space a string would occupy **IF** it was placed on a disk. – Chris Laplante Nov 29 '10 at 22:32
  • I repeat, a string is never placed on a disk!!! A string needs to be encoded before being able to be placed on a disk. WTF? – Darin Dimitrov Nov 29 '10 at 22:33
  • Please see my updated question. In this instance, it IS on the disk via localStorage and need to know when the user is nearing the end of his browser's quota. Sorry for the confusion... – Oscar Godson Nov 29 '10 at 22:34
  • @Oscar Godson, local storage is something totally browser dependent. Some browser even might use a SQLite database, so...? – Darin Dimitrov Nov 29 '10 at 22:34
  • @Darin Dimitrov; Ok then, if you want to get pedantic about it: "The asker wants to estimate the disk space a string would occupy **IF** it was placed on a disk using a specific encoding." – Chris Laplante Nov 29 '10 at 22:35
  • 1
    @SimpleCoder, OK, now that makes sense. The next question: what encoding? – Darin Dimitrov Nov 29 '10 at 22:36
  • @Darin Dimitrov: That is a question for the asker; not me. – Chris Laplante Nov 29 '10 at 22:36
  • @SimpleCoder, yes but as you are paraphrasing the OP's question I thought you might know. – Darin Dimitrov Nov 29 '10 at 22:37
  • @Darin Dimitrov: I'm sorry, I do not know which encoding the OP intends to use. – Chris Laplante Nov 29 '10 at 22:37
  • Is there a way to check? Because the app is available everywhere, couldn't the encoding be anything? I'm sorry, I havent read into encoding much yet... – Oscar Godson Nov 29 '10 at 22:39
  • @Oscar Godson, yes it could be anything, it might not even be stored on a disk, it is totally browser dependent. The HTML5 specification doesn't say how local storage should be implemented, so I guess you cannot reliably achieve what you are looking for cross browser. – Darin Dimitrov Nov 29 '10 at 22:40
  • @Darin Dimitrov Utter nonsence. When you write a string to a file, that string is on disk, right? Of course it is bytes, but it is bytes in memory as well. Everything on your computer is bytes, including your string data, your string object, your javascript compiler until the very registers in your processor, so stating that it's not a string because it is bytes is nonsence. And yes, the string needs to be encoded, but a string is always encoded somehow. It is encoded in memory as well. – GolezTrol Nov 29 '10 at 22:41
  • @Darin Dimitrov SQLite is separate and is part of the Web Database API which is separate from the Web Storage API, both are being worked on by the W3C. The size of their storage by domain could easily be estimated if i knew how to estimate the size of a text string and (based on answers and comments) depending on the encoding. – Oscar Godson Nov 29 '10 at 22:42
  • 1
    @GolezTrol, *When you write a string to a file, that string is on disk, right?* Absolutely not: you never can write a string to a file. You need to convert it to bytes first before writing it to a file and to convert a string to bytes you need to specify an encoding of course. – Darin Dimitrov Nov 29 '10 at 22:42
  • @Darin Dimitrov i can retrieve all the data per domain via the WS API. Thats all i need to estimate. Currently WebKit and Mozilla give 5MBs (to match the spec) and IE is 10MBs. I can easily estimate the size and prompt the user when nearing this cap. – Oscar Godson Nov 29 '10 at 22:44
  • @Darin Dimitrov: It is assumed that writing something to a file means writing the bytes; what else are you going to write? – Chris Laplante Nov 29 '10 at 22:44
  • 1
    And indeed, the disc doesn't know it is a string, but that's not the point, is it? A disc doesn't know what a bitmap is either and still I'm capable of storing a 40Gb photo collection on my disc. – GolezTrol Nov 29 '10 at 22:44
  • 2
    @Darin Dimitrov: I would suggest *you* go read a book about computers sometime soon instead of stating that everybody around you is stupid. – GolezTrol Nov 29 '10 at 22:46
  • Well my point is that you could implement this *writing of a string to a disk* in many different ways. For example you could convert the string to bytes using UTF-8 and then in order to preserve disk space gzip it. Then the size this string will occupy on this would be something completely different if you for example converted it to UTF-8 bytes and directly wrote those bytes to disk and yet you would be storing the same string on disk, wouldn't you? @GolezTrol, I never stated that everybody around me is stupid. Please don't say lies. – Darin Dimitrov Nov 29 '10 at 22:47
  • @Darin Dimotrov: So you're saying that if I sent you to the supermarket to get me some beer, you will tell me you can't buy beer in a supermarket, while you actually *mean* you don't know which brand of beer to buy and whether to buy a single can, a sixpack or a tray. Would't it be better to just ask more specifically what I want, instead of telling me it's impossible? – GolezTrol Nov 29 '10 at 22:50
  • @GolezTrol, I don't see how your example relates to my point. I ask you to point me in the HTML5 specification where it says how local storage should be implemented, i.e. how it *should be stored on disk*. – Darin Dimitrov Nov 29 '10 at 22:51
  • @Darin Dimitrov awh, so you're trying to say the 5MB cap is 5MB, but that some browsers (might) gzip so that that same 5MB is actually 10MB uncompressed? – Oscar Godson Nov 29 '10 at 22:53
  • @Oscar Godson, no, I don't say this at all. I just said that this is not specified in HTML5 so it is up to the people writing this browser to decide, that's all. The gzip was just an example which came to mind. – Darin Dimitrov Nov 29 '10 at 22:54
  • Yeah, so the script is just sniffing the useragent and the current implementations of that. The script can be easily updated if Mozilla or Webkit decides to let's say, change to 6MBs or 10MBs, but as of now including FF4 and IE9 the max cap is 5 & 10MBs – Oscar Godson Nov 29 '10 at 22:56
  • And if there is no match, let's say some random offbrand browser "Dan's Super Cool Web Engine" comes along there just wouldn't be any prompt. It's just a "nice" thing to have for the browsers that we/I know the max limit on their localStorage – Oscar Godson Nov 29 '10 at 22:58
  • @Oscar Godson, yes, I agree, now that you've clearly specified what you are looking for it makes a great question. +1 by the way. Unfortunately I don't know how you could achieve this reliably and I would be more than happy to learn it. Waiting for good answers... – Darin Dimitrov Nov 29 '10 at 23:00

5 Answers5

3

It is going to depend on your character encoding. If you use ASCII encoding, it's going to be str.length bytes. If you use UTF-16, it's going to be (str.length * 2) bytes. If you use UTF-8, it is going to depend on the characters in the string. (Some characters will only take 1 byte, but others could take up to 4 bytes.) If you're dealing with Base64-encoded data, the characters are all within the ASCII range and therefore would occupy str.length bytes on disk. If you decode them first and save as binary, it would take (str.length * 3/4) bytes. (With Base64, 3 uncoded bytes become 4 coded bytes.)

BTW - If you haven't read Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!), you should do so immediately.

http://www.joelonsoftware.com/articles/Unicode.html

UPDATE: If you're using localStorage, I assume that you're familiar with window.localStorage.length, though this only tells you how much has been used, not whether your new data will fit. I would also highly recommend reading Dive into HTML5, especially the section on storage:

http://diveintohtml5.ep.io/storage.html

Unless something has changed since its writing, I'm not sure what you can do as localStorage limits you to 5MB per domain with no way for the user to increase it.

DanBeale
  • 310
  • 4
  • 15
James Kovacs
  • 11,549
  • 40
  • 44
2

It's not going to be exact, but you can count the number of bytes in a string to get a rough estimation.

function bytes(string) {
    var escaped_string = encodeURI(string);
    if (escaped_string.indexOf("%") != -1) {
        var count = escaped_string.split("%").length - 1;
        count = count == 0 ? 1 : count;
        count = count + (escaped_string.length - (count * 3));
    }
    else {
        count = escaped_string.length;
    }

return count;

}

var mystring = 'tâ'; alert(bytes(mystring));

simshaun
  • 21,263
  • 1
  • 57
  • 73
0

You can count the number of bytes in a string by this simple and precise way

var head = 'data:image/png;base64,';
var imgFileSize = Math.round((string.length - head.length)*3/4) ;

console.log("size is ",imgFileSize);
Shweta Matkar
  • 301
  • 1
  • 11
0

If you are talking about memory usage, then no. There is no way of reliably determining the used memory (at least implementation-independently), since this is not part of the ECMAScript spec. It depends on your character encoding.

jwueller
  • 30,582
  • 4
  • 66
  • 70
0

It depends on the data in your string and the way it is stored. If your Base64 encoded string is stored as a Base64 encoded string, the length is the same as the size on disk. If not, you have to decode it

I found a solution (although it seems a bit icky) here

 function checkLength() {
    var countMe = document.getElementById("someText").value
    var escapedStr = encodeURI(countMe)
    if (escapedStr.indexOf("%") != -1) {
        var count = escapedStr.split("%").length - 1
        if (count == 0) count++  //perverse case; can't happen with real UTF-8
        var tmp = escapedStr.length - (count * 3)
        count = count + tmp
    } else {
        count = escapedStr.length
    }
    alert(escapedStr + ": size is " + count)
 }
GolezTrol
  • 114,394
  • 18
  • 182
  • 210