-2

I was trying to find out efficient data types. I know int is 4 bytes and char is one byte.

  1. an object which contains five integers (4 * 5 = 20 bytes)
  2. a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)

Am I right? Which one do you think it is better?

arshajii
  • 127,459
  • 24
  • 238
  • 287
heeh
  • 67
  • 2
  • 5

5 Answers5

10

The objective answer first:

  • Primitive data types are documented here
  • Strings are more complicated because the JVM can intern them. See a good explanation here

The not so objective answer: pick the data structure that makes for the best design for your application.

If you have a specific constraint in your application, post more details about the data you need to handle and the constraints you have.

Community
  • 1
  • 1
Christian Garbin
  • 2,512
  • 1
  • 23
  • 31
7

A String is not just an array of characters, it is an independent object, and has fields other than its backing char[]. For example, String has three int fields: offset, count and hash. The empty string, therefore, is generally 16 bytes (since we also need to take the char[] field into account) plus the normal 8 bytes of object overhead. Also note that a char[] is itself an object, and has the int field length and an associated object overhead. Once you have taken all this into account, then you can add the two (not one!) bytes per char.

So, for a 10-character string:

  • 3 int fields: 12 bytes
  • char[] field: 8 bytes
    • int field: 4 bytes
    • object overhead: 8 bytes
  • 10 characters: 20 bytes
  • object overhead: 8 bytes

This comes out to about 60 bytes. I say "about" because some of this is dependent on the VM.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • 1
    +1 for mentioning the object overhead that no one else has mentioned so far – Abbas Gadhia Aug 12 '13 at 02:56
  • However it would be nice if you can talk a little bit about "memory holes" aka object alignment along word boundaries http://stackoverflow.com/a/258150/638670 – Abbas Gadhia Aug 12 '13 at 03:03
1

You are incorrect about chars in Java: since they are designed to hold 16-bit UNICODE code points, they take two, not one byte each. In the end, both representations will take the same amount of memory.

You should pick the data type that makes the most sense to you, the designer of your classes, and to the readers of your code. Memory concerns should not be at the top of your design priorities unless the number of objects that you need threatens to overflow your available memory. Even then you should do careful memory profiling before you optimize.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

Characters are 2 bytes in size. They are equivalent to an unsigned short, so a character's value can range between [0, 65535] inclusive.

The number of bytes a String occupies is actually:

string.length * 2

So for your example, a 10 character string occupies 20 bytes, not 10 bytes.

This would be just the string content. There are other variables within the String class which will occupy more bytes of course. And even an empty object occupies a certain number of bytes that will vary based on the JVM implementation.

However, just the character content will occupy 2 bytes per character.

But don't worry about this as its most assuredly premature optimization. Clean code is more important than lightning fast code usually. Pick appropriate data types, write code that's easy to follow and read. These things are more important.

If you are worried about holding large strings in memory consider changing your approach. The most common problem I see with large strings is when new programmers read an entire file into memory.

If you are doing this, try processing data line by line. Only hold the smallest unit you need in memory at a time, perform your processing, and move on.

William Morrison
  • 10,953
  • 2
  • 31
  • 48
  • 1
    -1: this is simply wrong: *"So for your example, a 10 character string occupies 20 bytes, not 10 bytes"*. Things are way more complicated. There are other fields, there are other overheads associated with objects, there might be some padding, the string might be interned... – Bruno Reis Aug 12 '13 at 02:46
  • Please allow me to be more specific. **Just the string content** occupies 20 bytes. As I said in my previous comment. I will add this to my answer. – William Morrison Aug 12 '13 at 02:55
  • Now, I've updated my answer. It is accurate now. Please remove the -1 at your convenience. – William Morrison Aug 12 '13 at 02:58
0

I know int is 4 bytes

correct

and char is one byte.

A char is a 16-bit unsigned integer, so 2 bytes

an object which contains five integers (4 * 5 = 20 bytes)

A Object has a header which is 12 bytes on a 32-bit JVM and 16 bytes on a 64-bit JVM. Objects are 8 byte aligned, possibly 16 or 32 byte aligned if this is changed.

This means a new int[5] uses 16 + 20 + 4 (padding) = 40 bytes

a String object which has ten characters. ( Suppose it has 10 characters 10 * 1 = 10 bytes)

A String uses ~24 bytes with header and length fields etc, but it wraps a char[] which contains the actual chars, which is a further 16+20+4 = 40 bytes.

A simple way to check this is to use the following. Make sure you use -XX:-UseTLAB which improves memory accounting (but is slower for multi-threaded programming)

public static void main(String... ignored) {
    char[] chars = new char[10];
    long used = memoryUsed();
    String s= new String(chars);
    long diff = memoryUsed() - used;
    if (diff == 0) throw new AssertionError("You must set -XX:-UseTLAB on the command line");
    System.out.printf("Creating a String of 10 characters used %,d bytes of memory%n", diff);
}

private static long memoryUsed() {
    return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}

prints

Creating a String of 10 characters used 64 bytes of memory
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130