1

If you declare an array of strings in java like this

String[] words;

That just gives you a reference correct?

now, I am coming from a background in C, so I know that an array of "strings" in C are pointers to pointers, or an array of arrays. However, I am wondering how JVM uses this declaration... Is it just a single reference? Then when you give it sufficient memory, it will give the strings different lengths as well?

It's kind of hard for me to describe but I know that Strings are just arrays of characters, so how does the JVM determine how long the strings are before allocating it? Does it reallocate a whole new array of strings with a new updated string length.

char array[6][6]; //in C this is necessary because it needs to know the column and row length

similar to this

char* array[5]; // you still need to malloc the slots in the array for a two dimensional length to be achieved

but in java I dont get how this can work

  String line = null;
    try {
        while ((line = bfr.readLine()) != null) {
            if (StringUtils.isBlank(line))
                continue;

            System.out.println(line);
            String[] chunks = line.split(","); //this line right here, how does JVM allocate proper memory
            MindsparkPartnerCode record = new MindsparkPartnerCode();

            record.setIEFFCode(chunks[0]);
            records.add(record);
LeatherFace
  • 478
  • 2
  • 11

5 Answers5

3

Well, you're really asking two questions here.

First of all, declaring an array (of any depth) doesn't allocate memory in Java, whereas in your C example you're declaring and defining an array, which does allocate memory.

Java:

String[] words; // Just a reference (null at this point) -- no memory allocated

C:

char array[6][10]; // *Does* allocate 60 bytes of memory, usually on the stack.

In Java, no memory is ever allocated until the new operator is used (except, of course, for primitives).

Foo bar; // Just a reference
bar = new Foo(); // NOW memory has been allocated.

new returns a reference to a new object. Think of everything that isn't a primitive as a reference (this includes arrays!).


Strings are no different.

String[] foo;

... is just a reference to an array of String object references. Nothing more.

Even when you create the array...

foo = new String[20];

... Java allocates 20 * sizeof(JavaReference) bytes (where JavaReference is whatever underlying type the JVM is using to represent references). Therefore, the size of the array is now known.

When you actually add strings to that array...

foo[0] = "Hello!"; // Which is essentially...
foo[0] = new String("Hello!");

... THAT is when you're telling the JVM how long your string is, thus telling it to allocate (strlen("Hello!") + 1) * 2 bytes (since Java stores its strings in UTF-16 encoding).

As well, remember that strings are immutable, so the JVM doesn't have to worry about realloc'ing them.


Your question about strings is tricky with Java since Java takes what is otherwise just another class (String) and turns it into a language construct (as seen in that last code example). It's no wonder strings can be confusing when thinking in terms of memory and allocation.

Community
  • 1
  • 1
Qix - MONICA WAS MISTREATED
  • 14,451
  • 16
  • 82
  • 145
2

In Java a String is not an array of characters. it is a reference to garbage-collected instance of a class java.lang.String on the heap. From the docs:

The String class represents character strings. All string literals in Java programs, such as "abc", are implemented as instances of this class.

Strings are constant; their values cannot be changed after they are created. String buffers support mutable strings. Because String objects are immutable they can be shared. For example:

     String str = "abc";

is equivalent to:

     char data[] = {'a', 'b', 'c'};
     String str = new String(data);

The class String includes methods for examining individual characters of the sequence, for comparing strings, for searching strings, for extracting substrings, and for creating a copy of a string with all characters translated to uppercase or to lowercase. Case mapping is based on the Unicode Standard version specified by the Character class.

The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java...

Thus a String can be created from an array of characters, but it is more than an array of characters. This class is built into Java itself, so the compiler knows how to instantiate instances of this class from string literals typed into your code.

Thus when you do something like:

String[] chunks = line.split(",");

you are calling the method split on an instance of the class java.lang.String. It returns to you array of java.lang.String objects which it allocates itself (both the array and the strings). Eventually these will all be garbage collected when they are no longer referenced.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • So, when split is used, java allocates the memory as each new string comes into the array? like if one string comes in called "dog" the length of that string is 3, then the next string that returns from split is "abcdefghijklmnopqrstuvwxyz" with a length of 26. That must be a lot of work on the JVM – LeatherFace Dec 07 '14 at 22:12
  • I believe `split()` allocates the array + all the substrings inside itself and returns them to the caller, by building an `ArrayList` then returning that with `toArray()`. Found some source here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/regex/Pattern.java#Pattern.split%28java.lang.CharSequence%2Cint%29 and here: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.split%28java.lang.String%29 – dbc Dec 07 '14 at 22:17
  • @LeatherFace - yes, it can be some work. It's done for several reasons: 1) to properly support unicode. 2) To make the string-buffer overruns that plague c and c++ code impossible. 3) To allow string literals to be shared without having to be copied. – dbc Dec 07 '14 at 22:19
  • @LeatherFace - however, if you're worried about the performance of using Strings, put that behind you as you begin to code in Java. Write your Java string processing code in the most natural and straightforward way and let the garbage collector work for you. Later, if you find performance problems in your code, you can optimize. – dbc Dec 07 '14 at 22:22
  • very interesting, thanks for all the insight.. much appreciated – LeatherFace Dec 07 '14 at 22:24
1
 String[] chunks = line.split(","); //this line right here, how does JVM allocate proper memory

By the time this statement is complete, there will be N+1 new objects you can reference:

  • One String[] array object
  • N String objects, one sitting in each slot in the array
bmargulies
  • 97,814
  • 39
  • 186
  • 310
0

I have some C background as well. In Java an array of something is quite the same as in C. Its an array of pointers (or just base types like ints). The size of the array must be available to Java as well like in

String[] words = new String[10];

Your example declared words as an array of strings resulting in the variable 'words' with a null pointer to an array of string. My example will point 'words' to an array of 10 pointers to strings.

Have a look at the API java.util.arrays

https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html

Jemolah
  • 1,962
  • 3
  • 24
  • 41
0

It allocates an array of references to other heap objects. A capital-S String in Java is itself a reference to an object on the heap, which in turn contains a reference to a char[], which may have arbitrary size.

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413