5

Opened up a new file in Notepad and inserted the sentence without the quotes, "Four score and seven years ago" in it.

Four              4 characters
score             5 characters
and               3 characters
seven             5 characters 
years             5 characters 
ago               3 characters

TOTAL : 25 + 5 spaces = 30 characters.

You will find that the file has a size of 30 bytes on disk: 1 byte for each character. Saved the file to disk under the name gettingSize.txt. Then look at the size of the file. As a rule, Each character consumes a byte.

Size : 30 bytes
Size on Disk : 4.00 KB (4,096 bytes)

The below paragraphs are copy pasted from a pdf.

If you were to look at the file as a computer looks at it, you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character (see below). So on disk, the numbers for the file look like this:

F o u r a n d s e v e n

70 111 117 114 32 97 110 100 32 115 101 118 101 110

By looking in the ASCII table, you can see a one-to-one correspondence between each character and the ASCII code used. Note the use of 32 for a space -- 32 is the ASCII code for a space. We could expand these decimal numbers out to binary numbers (so 32 = 00100000) if we wanted to be technically correct -- that is how the computer really deals with things.

1) i know that every thing is stored in the form of bits and bytes, so what generally this means - "you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character". A byte is 8 bits. So how does "each byte a number -- the number is the ASCII code". How can a byte contains a ASCII number(eg. 49 for '1') other than 0 and 1?

2) What exactly is the difference between Size and Size on Disk? And How does ASCII and Unicode fit into it?

3)In Java, Strings are objects. Can I say it be a multiple characters concated together? String str = "Four score and seven years ago" So how does a str stored in memory. Is it in the same manner as saving in the notepad file?

Farhan stands with Palestine
  • 13,890
  • 13
  • 58
  • 105

3 Answers3

7

Files are stored in blocks. If file size is smaller than block size (in your case, 4KB) the file will take all block but most of its space is unused. I think this question was answered on SuperUser, i'll find the link. UPDATE: https://superuser.com/questions/704218/why-is-there-such-a-big-difference-between-size-and-size-on-disk

enter image description here

Community
  • 1
  • 1
Mohammad Jafar Mashhadi
  • 4,102
  • 3
  • 29
  • 49
3

To make a few short points:

  1. "How can a byte contain an ASCII number (eg. 49 for '1') other than 0 and 1?

    A Byte is 8 bits. Thus you can store numbers between 0 and 255 in it.

  2. What is the difference between filesize and size on disk:

    See MJafar Mash's answer: "size" is the actual size in bytes and "size on disk" is the number of bytes you need to allocate as blocks for the file to be placed in.

  3. In Java Strings are Objects. Can I say that a String is multiple characters concatenated together?

    Yes, but It's actually more complicated than that:
    Taken from this answer:

    Initializes a newly created String object so that it represents the same sequence of characters as the argument; in other words, the newly created string is a copy of the argument string. Unless an explicit copy of original is needed, use of this constructor is unnecessary since Strings are immutable.

Community
  • 1
  • 1
Vogel612
  • 5,620
  • 5
  • 48
  • 73
  • A (primitive) `byte` is 8 bits. A (wrapper) `Byte` is not 8 bits. – TheLostMind Sep 08 '14 at 07:23
  • While you are actually talking good Java here, IIRC at least german network engineers are **always** writing Byte with the first letter as majuscle to add additional differentiation between bits and Bytes. – Vogel612 Sep 08 '14 at 07:25
  • But that is *blatantly* wrong because a `byte` is not a `Byte`. :P – TheLostMind Sep 08 '14 at 07:26
  • You are thinking too Java here. This is waaaay below that. I am talking about OSI layer 1 and you are talking about OSI layers 6-8... – Vogel612 Sep 08 '14 at 07:28
  • ASCII uses 7-bit numbers to represent the letters, numerals and common punctuation used in American English. ASCII maps 65 to A. So when i typed A from my laptop keypad, how does did they achieve reflecting (representing A) on the machine screen. I just failed to understand this thing? – Farhan stands with Palestine Sep 08 '14 at 08:19
  • @ShirgillAnsari well your keyboard will send an Event to your OS. This event probably contains the Unicode Codepoint of the Key you pressed. That again corresponds 1:1 to the ASCII Code for the ASCII values (backward compatiblity). And that is **already** the byte representation you want saved on your harddrive – Vogel612 Sep 08 '14 at 10:04
  • Didn't get you. What do you mean by "That again corresponds 1:1 to the ASCII Code for the ASCII values (backward compatiblity)."?I do the Understand the backward compatibility. And am i correct to say ASCII comprises 128 code points mapped in the ratio 1:1 to the ASCII values. – Farhan stands with Palestine Sep 08 '14 at 11:18
  • What I am saying is, that the first 128 codepoints of UTF-* are the same as the ASCII codepoints you have. Your Keyboard provides additional possiblitites outside of ASCII scope, given you use the ALT+codepoint feature. – Vogel612 Sep 08 '14 at 11:27
1

1) i know that every thing is stored in the form of bits and bytes, so what generally this means - "you would find that each byte contains not a letter but a number -- the number is the ASCII code corresponding to the character". A byte is 8 bits. So how does "each byte a number -- the number is the ASCII code". How can a byte contains a ASCII number(eg. 49 for '1') other than 0 and 1?

Each ASCII character occupies 1 byte. Internally, each character is stored as its ASCII number. So, you can store 8-bits of data max i.e, 2^8 -1 = 255. So the range would be 0-255.

2) What exactly is the difference between Size and Size on Disk? And How does ASCII and Unicode fit into it?

Each ASCII character is 1 byte. So, 30 bytes is the actual size of the data in the file. Next, the 4KB is the size of the segment/block in which the file is stored. In your case it is the minimum "new" space given to any file on the disk.

3)In Java, Strings are objects. Can I say it be a multiple characters concated together? String str = "Four score and seven years ago" So how does a str stored in memory. Is it in the same manner as saving in the notepad file?

Yes. Strings are indeed (internally) multiple characters concatenated together. But the characters cannot be changed.String is an object, so , they are stored as an array of characters (in java each character is 2 bytes). Java uses UTF-8 (it could be different based on various factors) as default Charset. You can also change it.

TheLostMind
  • 35,966
  • 12
  • 68
  • 104