Reading one character at a time from a large string

Question

I have a big string having at most 100000 character. Instead of using string.charAt[index] to read a character from the string, I converted that string into char array using string.toCharArray() method and now i am working with charArray[index]. which takes less time than string.charAt[index] method. However i want to know that, is there any other way which is faster than string.toCharArray(); method?

How did you determine that that `string.charAt(index)` is slower? I wouldn't think it would be slower. — Louis Wasserman, Mar 24 '12 at 11:35
For your convenience, maybe I can suggest using [StringReader](http://docs.oracle.com/javase/6/docs/api/java/io/StringReader.html) — Jakub Zaverka, Mar 24 '12 at 11:37
I noticed the `System.getTimeinMills()` and `string.charAt(index)` also uses the array indexing. so better to have an array. — ravi, Mar 24 '12 at 13:22
@Ravi Joshi: *"using string.charAt[index] to read a character from the string"*... String's *charAt* does *not* read a character from the String. It reads a Java *char*, which is totally inadequate to hold all the Unicode characters. A character, since Java 1.4, may need more than one Java *char* to be represented using *char*. A website like Stackoverflow, for example, fully supports Unicode and all the Unicode codepoints. Java's *char* primitive does not. — TacticalCoder, Mar 24 '12 at 14:33
@TacticalCoder: Thank you for this information. i did not aware of this fact. However in my case the string is composed of only lower case alphabets i.e. a-z. — ravi, Mar 24 '12 at 20:30
@TacticalCoder : what you say is wrong. A char primitive IS a unicode character. Maybe you are confusing with the byte primitive ? From the official doc : "The char data type is a single 16-bit Unicode character." source : http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html Example : char rr = '華'; — Pierre Henry, Nov 01 '13 at 14:34
@Pierre Henry: no, I am not confusing anything ; ) Many Unicode codepoints need two Java char to be encoded. By using the .charAt(...) method on such a Unicode codepoint you'd be reading only part of that codepoint. That is why in this day and age methods like *charAt* and *length* are mostly broken. You want to use *codePointAt* instead. Example: how do you put the character 'U+1040B' inside a Java *char*? You simply can't do it. See answer from 100K+ SO user here: http://stackoverflow.com/questions/12280801 (*"...a Java char holds a UTF-16 code unit instead of a Unicode character..."*) — TacticalCoder, Nov 03 '13 at 17:45
Yes, you are right, sorry about that. I was persuaded that unicode used only 16 bits max. Thanks for pointing this out. I am not looking forward to have to work with those "astral" planes ;) — Pierre Henry, Nov 07 '13 at 15:42
This problem already discussed in stack http://stackoverflow.com/questions/8894258/fastest-way-to-iterate-over-all-the-chars-in-a-string — it's me, Sep 04 '14 at 16:38

score 1 · Accepted Answer · answered Mar 24 '12 at 11:39

1

I do not think there is a faster way. But please correct me!

A String instance is backed by a char array. charAt() does some index checks which may be the cause for it being slower than working with the array returned by toCharArray(). toCharArray() simply does a System.arraycopy() of the backing array.

answered Mar 24 '12 at 11:39

nansen

2,912
1
20
33

while using `string.charAt[index]`, Is every time an `char[string.length()]` is created? If yes, then it may be the reason of its less performance. – ravi Mar 24 '12 at 13:26

Reading one character at a time from a large string

1 Answers1